Re: initdb --data-checksums

2020-11-09 Thread Michael Paquier
On Mon, Nov 09, 2020 at 06:03:43PM +0100, Paul Förster wrote:
> indeed, it is. Have a look at:
> 
> https://www.postgresql.org/docs/12/app-pgchecksums.html
> 
> Make sure the database is cleanly shut down before doing it.

This tool is really useful with upgrades after pg_upgrade.  Please
note that there is a --progress option, so you can basically know how
long it is going to take until completion.
--
Michael


signature.asc
Description: PGP signature


Re: initdb --data-checksums

2020-11-09 Thread Paul Förster
Hi Matt,

> On 09. Nov, 2020, at 18:00, Matt Zagrabelny  wrote:
> 
> Hello,
> 
> I see the --data-checksums option for initdb. Is it possible to use 
> --data-checksums after the cluster has been initialized? I'm guessing "not", 
> but thought I'd ask.
> 
> I'm running v12 on Debian.
> 
> Thanks for any help!

indeed, it is. Have a look at:

https://www.postgresql.org/docs/12/app-pgchecksums.html

Make sure the database is cleanly shut down before doing it.

Cheers,
Paul



initdb --data-checksums

2020-11-09 Thread Matt Zagrabelny
Hello,

I see the --data-checksums option for initdb. Is it possible to use
--data-checksums after the cluster has been initialized? I'm guessing
"not", but thought I'd ask.

I'm running v12 on Debian.

Thanks for any help!

-m


Re: Using of --data-checksums

2020-04-16 Thread Michael Paquier
On Thu, Apr 16, 2020 at 03:47:34PM -0700, Jeremy Schneider wrote:
> Data checksums are a hard requirement across the entire RDS PostgreSQL
> fleet - we do not allow it to be disabled in RDS. I've definitely seen a
> lot of hard evidence (for example, customer cases I've personally been
> involved in) that it catches problems.

Oh, that's good to know.  Thanks for the information.  I pushed hard
as well to make this a requirement on what I work on.

> I could not exaggerate how useful
> and important I think this feature is: being able to very quickly and
> easily know that a problem originated outside of the PostgreSQL code.

The worst part with checksums disabled is having to tell a customer or
a support staff that you don't actually know from where the problem
comes, what is the actual origin of it, and why you think that the
error you are seeing in the Postgres logs is most likely linked to a
lower-level corruption as there can be many patterns, like broken
btree pages, toast errors, missing attributes in catalogs, failures
with FK references, primary key duplicates, etc.  And people like
to complain a lot about the database being broken because that's a
very sensitive piece and usually more things depend on it.  With
checksums enabled, you still cannot say exactly from where the problem
comes, but you can redirect the complains more easily to the correct
people to help find out what the actual problem is.  Even better, you
can also know if a problem probably comes directly from Postgres and
some backend logic if you don't see a checksum failure (note that
could be as well a misdesigned HA workflow, custom backup script as
well who knows but at least you know that something you control
directly gets wrong).  And the error message provided is clear.

> This was in part what led to that long blog article I wrote about
> checksums, and it's why enabling checksums was happiness hint #1 until I
> broke them into categories.

Reference? ;p
--
Michael


signature.asc
Description: PGP signature


Re: Using of --data-checksums

2020-04-16 Thread Magnus Hagander
On Sun, Apr 12, 2020 at 4:23 PM Tom Lane  wrote:

> Magnus Hagander  writes:
> > And FWIW, I do think we should change the default. And maybe spend some
> > extra effort on the message coming out of pg_upgrade in this case to make
> > it clear to people what their options are and exactly what to do.
>
> Is there any hard evidence of checksums catching problems at all?
> Let alone in sufficient number to make them be on-by-default?
>

I would say yes. I've certainly had a fair number of cases where they've
detected storage corruption, especially with larger SAN type installation.
And coupled with validating the checksum on backup (either with
pg_basebackup or pgbackrest) it enables you to find the errors *early*,
while you can still restore a previous backup and replay WAL to get to a
point where you don't have to lose any data.

I believe both Stephen and David have some good stories they've heard from
people catching such issues with backrest as well.

This and as Michael also points out, it lets you know that the problem
occurred outside of PostgreSQL, makes for very important information when
tracking down issues.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ 
 Work: https://www.redpill-linpro.com/ 


Re: Using of --data-checksums

2020-04-12 Thread Michael Paquier
On Sun, Apr 12, 2020 at 10:23:24AM -0400, Tom Lane wrote:
> Magnus Hagander  writes:
>> And FWIW, I do think we should change the default. And maybe spend some
>> extra effort on the message coming out of pg_upgrade in this case to make
>> it clear to people what their options are and exactly what to do.
> 
> Is there any hard evidence of checksums catching problems at all?
> Let alone in sufficient number to make them be on-by-default?

I don't know if that's a sufficient number, but I have dealt with
corruption cases on virtual environments where these have been really
essential to find out proof that the origin of the problem was not
Postgres because those bugs created wild and incorrect block
overwrites.  With the software stack getting more complicated, making
them the default would make sense IMO.  Now the case of upgrades is
more tricky than it is, no?  There is a copy of the file so we may be
able to do a block-to-block copy and update of the checksum, but you
cannot do that with the --link mode.
--
Michael


signature.asc
Description: PGP signature


Re: Using of --data-checksums

2020-04-12 Thread Tom Lane
Magnus Hagander  writes:
> And FWIW, I do think we should change the default. And maybe spend some
> extra effort on the message coming out of pg_upgrade in this case to make
> it clear to people what their options are and exactly what to do.

Is there any hard evidence of checksums catching problems at all?
Let alone in sufficient number to make them be on-by-default?

regards, tom lane




Re: Using of --data-checksums

2020-04-12 Thread Magnus Hagander
On Sun, Apr 12, 2020 at 8:05 AM Michael Paquier  wrote:

> On Fri, Apr 10, 2020 at 04:37:46PM -0400, Stephen Frost wrote:
> > There's definitely a lot of reasons to want to have the ability to
> > change an existing cluster.  Considering the complications around
> > running pg_upgrade already, I don't really think that changing the
> > default of initdb would be that big a hurdle for folks to deal with-
> > they'd try the pg_upgrade, get a very quick error that the new cluster
> > has checksums enabled and the old one didn't, and they'd re-initdb the
> > new cluster and then re-run pg_upgrade to figure out what the next issue
> > is..
>
> We discussed that a couple of months ago, and we decided to keep that
> out of the upgrade story, no?  Anyway, if you want to enable or
> disable data checksums on an existing cluster, you always have the
> possibility to use pg_checksums --enable.  This exists in core since
> 12, and there is also a version on out of core for older versions of
> Postgres: https://github.com/credativ/pg_checksums.  On apt-based
> distributions like Debian, this stuff is under the package
> postgresql-12-pg-checksums.
>

The fact that this tool exists, and then in the format of pg_checksums
--disable, I think is what makes the argument to turn on checksums by
default possible. Because it's now very easy and fast to turn it off even
if you've accumulated sizable data in your cluster. (Turning it on in this
case is easy, but not fast).

And FWIW, I do think we should change the default. And maybe spend some
extra effort on the message coming out of pg_upgrade in this case to make
it clear to people what their options are and exactly what to do.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/ <http://www.hagander.net/>
 Work: https://www.redpill-linpro.com/ <http://www.redpill-linpro.com/>


Re: Using of --data-checksums

2020-04-12 Thread Michael Paquier
On Fri, Apr 10, 2020 at 04:37:46PM -0400, Stephen Frost wrote:
> There's definitely a lot of reasons to want to have the ability to
> change an existing cluster.  Considering the complications around
> running pg_upgrade already, I don't really think that changing the
> default of initdb would be that big a hurdle for folks to deal with-
> they'd try the pg_upgrade, get a very quick error that the new cluster
> has checksums enabled and the old one didn't, and they'd re-initdb the
> new cluster and then re-run pg_upgrade to figure out what the next issue
> is..

We discussed that a couple of months ago, and we decided to keep that
out of the upgrade story, no?  Anyway, if you want to enable or
disable data checksums on an existing cluster, you always have the
possibility to use pg_checksums --enable.  This exists in core since
12, and there is also a version on out of core for older versions of
Postgres: https://github.com/credativ/pg_checksums.  On apt-based
distributions like Debian, this stuff is under the package
postgresql-12-pg-checksums.
--
Michael


signature.asc
Description: PGP signature


Re: Using of --data-checksums

2020-04-10 Thread Stephen Frost
Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Wed, Apr  8, 2020 at 11:54:34AM -0400, Stephen Frost wrote:
> > * BGoebel (b.goe...@prisma-computer.de) wrote:
> > > initdb --data-checksums "... help to detect corruption by the I/O system"
> > > There is an (negligible?) impact on performance, ok. 
> > >  
> > > Is there another reason NOT to use this feature ?
> > 
> > Not in my view.
> > 
> > > Has anyone had good or bad experience with the use of  --data-checksums?
> > 
> > Have had good experience with it.  We should really make it the default
> > already.
> 
> Yeah, but I think we wanted more ability to change an existing cluster
> before doing that since it would affect pg_upgraded servers.

There's definitely a lot of reasons to want to have the ability to
change an existing cluster.  Considering the complications around
running pg_upgrade already, I don't really think that changing the
default of initdb would be that big a hurdle for folks to deal with-
they'd try the pg_upgrade, get a very quick error that the new cluster
has checksums enabled and the old one didn't, and they'd re-initdb the
new cluster and then re-run pg_upgrade to figure out what the next issue
is..

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Using of --data-checksums

2020-04-10 Thread Bruce Momjian
On Wed, Apr  8, 2020 at 11:54:34AM -0400, Stephen Frost wrote:
> Greetings,
> 
> * BGoebel (b.goe...@prisma-computer.de) wrote:
> > initdb --data-checksums "... help to detect corruption by the I/O system"
> > There is an (negligible?) impact on performance, ok. 
> >  
> > Is there another reason NOT to use this feature ?
> 
> Not in my view.
> 
> > Has anyone had good or bad experience with the use of  --data-checksums?
> 
> Have had good experience with it.  We should really make it the default
> already.

Yeah, but I think we wanted more ability to change an existing cluster
before doing that since it would affect pg_upgraded servers.

-- 
  Bruce Momjian  https://momjian.us
  EnterpriseDB https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +




Re: Using of --data-checksums

2020-04-08 Thread Stephen Frost
Greetings,

* BGoebel (b.goe...@prisma-computer.de) wrote:
> initdb --data-checksums "... help to detect corruption by the I/O system"
> There is an (negligible?) impact on performance, ok. 
>  
> Is there another reason NOT to use this feature ?

Not in my view.

> Has anyone had good or bad experience with the use of  --data-checksums?

Have had good experience with it.  We should really make it the default
already.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Using of --data-checksums

2020-04-07 Thread Michael Paquier
On Tue, Apr 07, 2020 at 08:10:13AM -0700, BGoebel wrote:
> initdb --data-checksums "... help to detect corruption by the I/O system"
> There is an (negligible?) impact on performance, ok. 
>  
> Is there another reason NOT to use this feature ?
> Has anyone had good or bad experience with the use of  --data-checksums?

FWIW, I have a good experience with it.  Note that some performance
impact of up to ~1% may be noticeable if you have a large number of
buffer evictions from PostgreSQL shared buffer pool, but IMO the
insurance of knowing that Postgres is not the cause of an on-disk
corruption is largely worth it (in applications where I got that
enabled we did not notice any performance impact even in very heavy
production-like workloads, and this even if we had a rather low shared
buffer setting with a much larger set of hot pages, causing the OS
cache to be filled with most of the hot data).
--
Michael


signature.asc
Description: PGP signature


Using of --data-checksums

2020-04-07 Thread BGoebel
initdb --data-checksums "... help to detect corruption by the I/O system"
There is an (negligible?) impact on performance, ok. 
 
Is there another reason NOT to use this feature ?
Has anyone had good or bad experience with the use of  --data-checksums?

Thanks in advance!

Bernhard



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Recommended way to enable data-checksums on Centos 7?

2019-02-25 Thread Adrian Klaver

On 2/25/19 2:05 PM, Ken Tanzer wrote:
On Mon, Feb 25, 2019 at 1:35 PM Adrian Klaver > wrote:


If I am following correctly, what you want is something
like the below from pg_ctl, correct?:

https://www.postgresql.org/docs/11/app-pg-ctl.html

pg_ctl init[db] [-D datadir] [-s] [-o initdb-options]

...

 >      > checksums are good for data integrity, and can only be
done at init
 >      > time, I wonder if it's worth adding a note about it to that
 >      > documentation page?
 >      >
 > 



Something like that "-o" option would be nice.  But regardless of the 
solution, and given how simple & streamlined the install path/directions 
are, it seems to me to make sense to have some way of doing this that 
doesn't involve having to read through the setup script.  That could be 


From looking at the script running:

postgresql-11-setup --help

will get you that information.

a simpler option, or it could just be a footnote on the install page, 
next to the initdb command, something like:


That would be nice, unfortunately I am not the person to get that done.
Your best bet on this would be to file an issue here:

https://yum.postgresql.org/contact.php

You will need a community account to do this.



Optionally initialize the database and enable automatic start:
   /usr/pgsql-11/bin/postgresql-11-setup initdb (*)
   systemctl enable postgresql-11
   systemctl start postgresql-11

(*) Some options for initializing the database can't be changed later 
(except by wiping out and reinitializing).  You can specify options by 
running export PGSETUP_INITDB_OPTIONS="" before running 
this command.  See 
 for options.

Cheers,
Ken


--
AGENCY Software
A Free Software data system
By and for non-profits
/http://agency-software.org//
/https://demo.agency-software.org/client/
ken.tan...@agency-software.org 
(253) 245-3801

Subscribe to the mailing list 
 to

learn more about AGENCY or
follow the discussion.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Recommended way to enable data-checksums on Centos 7?

2019-02-25 Thread Ken Tanzer
On Mon, Feb 25, 2019 at 1:35 PM Adrian Klaver 
wrote:

> If I am following correctly, what you want is something
> like the below from pg_ctl, correct?:
>
> https://www.postgresql.org/docs/11/app-pg-ctl.html
>
> pg_ctl init[db] [-D datadir] [-s] [-o initdb-options]

...

> >  > checksums are good for data integrity, and can only be done at
> init
> >  > time, I wonder if it's worth adding a note about it to that
> >  > documentation page?
> >  >
> >


Something like that "-o" option would be nice.  But regardless of the
solution, and given how simple & streamlined the install path/directions
are, it seems to me to make sense to have some way of doing this that
doesn't involve having to read through the setup script.  That could be a
simpler option, or it could just be a footnote on the install page, next to
the initdb command, something like:

Optionally initialize the database and enable automatic start:
  /usr/pgsql-11/bin/postgresql-11-setup initdb (*)
  systemctl enable postgresql-11
  systemctl start postgresql-11

(*) Some options for initializing the database can't be changed later
(except by wiping out and reinitializing).  You can specify options by
running export PGSETUP_INITDB_OPTIONS="" before running this
command.  See  for
options.

Cheers,
Ken


-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: Recommended way to enable data-checksums on Centos 7?

2019-02-25 Thread Adrian Klaver

On 2/25/19 11:19 AM, Ken Tanzer wrote:



On Sun, Feb 24, 2019 at 5:03 PM Adrian Klaver > wrote:


On 2/24/19 2:39 PM, Ken Tanzer wrote: 

 > I'm just wondering if there is a more preferred way to do this, 

Not seeing anything responsive to this question, I'll assume that 
PGSETUP_INITDB_OPTIONS is the preferred method.


From looking at the script I would yes.




and/or
 > any particular reason you can't pass options to initdb?  Also,
since the

You can pass options to initdb:

Yes.  I guess I shoulda said "cant pass option to initdb via the setup 
script."


Technically using PGSETUP_INITDB_OPTIONS does pass the initdb options to 
the script:) If I am following correctly, what you want is something 
like the below from pg_ctl, correct?:


https://www.postgresql.org/docs/11/app-pg-ctl.html

pg_ctl init[db] [-D datadir] [-s] [-o initdb-options]




It seems you cannot pass them to the script postgresql-11-setup. That
would be a question for the packagers:

https://yum.postgresql.org/contact.php


 > checksums are good for data integrity, and can only be done at init
 > time, I wonder if it's worth adding a note about it to that
 > documentation page?
 >


OK.  Do the yum packagers also maintain the documentation for 
downloading and installing Postgresql?
(i.e., https://www.postgresql.org/download/linux/redhat/)  And are you 
saying this question/topic is inappropriate for this list?


The question is not inappropriate as the above is the community site and 
this is the community list for general questions. It is just that the RH 
family packages are maintained outside of the core source by the folks 
listed at the contacts link. It is they who in the end will or will not 
make any changes. Just a matter of circles within circles in the 
'community'.




Thanks,
Ken

--
AGENCY Software
A Free Software data system
By and for non-profits
/http://agency-software.org//
/https://demo.agency-software.org/client/
ken.tan...@agency-software.org 
(253) 245-3801

Subscribe to the mailing list 
 to

learn more about AGENCY or
follow the discussion.



--
Adrian Klaver
adrian.kla...@aklaver.com



Re: Recommended way to enable data-checksums on Centos 7?

2019-02-25 Thread Ken Tanzer
On Sun, Feb 24, 2019 at 5:03 PM Adrian Klaver 
wrote:

> On 2/24/19 2:39 PM, Ken Tanzer wrote:

> I'm just wondering if there is a more preferred way to do this,

Not seeing anything responsive to this question, I'll assume that
PGSETUP_INITDB_OPTIONS
is the preferred method.



> and/or
> > any particular reason you can't pass options to initdb?  Also, since the
>
> You can pass options to initdb:
>
Yes.  I guess I shoulda said "cant pass option to initdb via the setup
script."



>
> It seems you cannot pass them to the script postgresql-11-setup. That
> would be a question for the packagers:
>
> https://yum.postgresql.org/contact.php
>
>
> > checksums are good for data integrity, and can only be done at init
> > time, I wonder if it's worth adding a note about it to that
> > documentation page?
> >
>

OK.  Do the yum packagers also maintain the documentation for downloading
and installing Postgresql?
(i.e., https://www.postgresql.org/download/linux/redhat/)  And are you
saying this question/topic is inappropriate for this list?

Thanks,
Ken


-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ *
*https://demo.agency-software.org/client
*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Recommended way to enable data-checksums on Centos 7?

2019-02-24 Thread Ken Tanzer
Hi.  I recently installed PG 11.2 on Centos 7, following the excellent
directions at https://www.postgresql.org/download/linux/redhat/.

I wanted to enable data-checksums.  I at first tried appending
--data-checksums to the doc-specified command:

/usr/pgsql-11/bin/postgresql-11-setup initdb --data-checksums

but that did not work.  After a glance at that script, I was able to do it
with:

export PGSETUP_INITDB_OPTIONS="--data-checksums"
/usr/pgsql-11/bin/postgresql-11-setup initdb

I'm just wondering if there is a more preferred way to do this, and/or any
particular reason you can't pass options to initdb?  Also, since the
checksums are good for data integrity, and can only be done at init time, I
wonder if it's worth adding a note about it to that documentation page?

Thanks.

Ken


-- 
AGENCY Software
A Free Software data system
By and for non-profits
*http://agency-software.org/ <http://agency-software.org/>*
*https://demo.agency-software.org/client
<https://demo.agency-software.org/client>*
ken.tan...@agency-software.org
(253) 245-3801

Subscribe to the mailing list
 to
learn more about AGENCY or
follow the discussion.


Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-14 Thread Peter J. Holzer
On 2018-01-10 11:39:21 -0800, Andres Freund wrote:
> On 2018-01-09 20:51:17 -0500, Stephen Frost wrote:
> > * Andreas Joseph Krogh (andr...@visena.com) wrote:
> > > Aha, so enabling CRC causes hint-bits to be written causing extra 
> > > WAL-logging, 
> > > which woudn't be the case without CRC enabled?
> > > Thanks for pointing that out.
> > 
> > Yes, having checksums enabled forces logging of hint bits.  You can
> > enable wal_log_hints independently too, without having checksums, to see
> > what kind of an impact it'll have on your environment.
> > 
> > A useful documentation update might be:
> > 
> > ---
> > With checksums enabled, wal_log_hints 
> > will be enabled and each page read or write will involve calculating the
> > checksum for the page.
> > ---
> > 
> > I'd probably just replace the "Enabling checksums may incur a noticeable
> > performance penalty" with the above, as it should be clear that doing
> > more work implies an impact on performance and that avoids the whole
> > question of trying to characterize in a general way something that can't
> > be generalized (as it's workload dependent).
> 
> -1. I think this is underplaying the cost.

I disagree. At least two people in this thread interpreted "noticable"
as "measurable but negligible" (which I personally find a bit
surprising). Computing the CRC basically means reading the whole page
from RAM which should be fast compared to a transfer to or from disk. So
it is easy to disregard this sentence as "was probably written when a
Pentium II was new and never updated". Stephen's version draws attention
to the fact that enabling CRCs may cause extra disk writes, which rings
a much louder alarm bell for me. (When are those hint-bits set? Does this
happen often when otherwise no write would have been necessary? I have
no idea so I guess I'd better measure it!)

hp

-- 
   _  | Peter J. Holzer| we build much bigger, better disasters now
|_|_) || because we have much more sophisticated
| |   | h...@hjp.at | management tools.
__/   | http://www.hjp.at/ | -- Ross Anderson 


signature.asc
Description: PGP signature


Sv: Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Andreas Joseph Krogh
På onsdag 10. januar 2018 kl. 22:36:42, skrev Jeff Janes <jeff.ja...@gmail.com 
<mailto:jeff.ja...@gmail.com>>:
On Wed, Jan 10, 2018 at 12:23 PM, Thomas Poty <thomas.p...@gmail.com 
<mailto:thomas.p...@gmail.com>> wrote: Hello, A question seems to be, according 
to me, important :
How a corruption, detected thanks to data-checksums, is fixed?

 
Take two full cold backups of the current mess you have, including the 
executables, and lock one of them away where you can't accidentally do 
something to make it worse.
 
Replace the hardware (or fix the software bug) which lead to this, so it 
doesn't eat more of your data than it already has.
 
If you have a wal archive, then restore from the most recent backup and 
recover it forward with the WAL, hoping your hardware problem hasn't polluted 
that as well.  Make sure it rolls forward as far as you think it should.  If 
you think it rolled forward all the way, then you are probably done.  I'd take 
a full cold backup as well as as full pg_dump(all) at this point and lock it 
away for future forensics, just in case, and also to see if any more errors are 
found by the pg_dump.
 
I'd also extract the damaged block and inspect it.  If the value says "MQs. 
Johnson" but the obvious (due to context) correction to "Mrs. Johnson" matches 
what the post-recovery block also says, then I'd say you are pretty good.  If 
the damage was to a floating point number or a header in which you don't have 
any useful context to guide you, you might want to engage a professional at 
this kind of thing if the data is very important to you.



 
Btrfs actually fixes (some) corruption once it detects it, so Thomas' question 
is not all that far-fetched.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com <mailto:andr...@visena.com>
www.visena.com <https://www.visena.com>
 <https://www.visena.com>


 


Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Thomas Poty
Hello,
A question seems to be, according to me, important :
How a corruption, detected thanks to data-checksums, is fixed?

Thank you,
Thomas

Le 10 janv. 2018 20:39, "Andres Freund" <and...@anarazel.de> a écrit :

> On 2018-01-09 20:51:17 -0500, Stephen Frost wrote:
> > Greetings,
> >
> > * Andreas Joseph Krogh (andr...@visena.com) wrote:
> > > Aha, so enabling CRC causes hint-bits to be written causing extra
> WAL-logging,
> > > which woudn't be the case without CRC enabled?
> > > Thanks for pointing that out.
> >
> > Yes, having checksums enabled forces logging of hint bits.  You can
> > enable wal_log_hints independently too, without having checksums, to see
> > what kind of an impact it'll have on your environment.
> >
> > A useful documentation update might be:
> >
> > ---
> > With checksums enabled, wal_log_hints 
> > will be enabled and each page read or write will involve calculating the
> > checksum for the page.
> > ---
> >
> > I'd probably just replace the "Enabling checksums may incur a noticeable
> > performance penalty" with the above, as it should be clear that doing
> > more work implies an impact on performance and that avoids the whole
> > question of trying to characterize in a general way something that can't
> > be generalized (as it's workload dependent).
>
> -1. I think this is underplaying the cost.
>
> Greetings,
>
> Andres Freund
>
>


Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-10 Thread Andres Freund
On 2018-01-09 20:51:17 -0500, Stephen Frost wrote:
> Greetings,
> 
> * Andreas Joseph Krogh (andr...@visena.com) wrote:
> > Aha, so enabling CRC causes hint-bits to be written causing extra 
> > WAL-logging, 
> > which woudn't be the case without CRC enabled?
> > Thanks for pointing that out.
> 
> Yes, having checksums enabled forces logging of hint bits.  You can
> enable wal_log_hints independently too, without having checksums, to see
> what kind of an impact it'll have on your environment.
> 
> A useful documentation update might be:
> 
> ---
> With checksums enabled, wal_log_hints 
> will be enabled and each page read or write will involve calculating the
> checksum for the page.
> ---
> 
> I'd probably just replace the "Enabling checksums may incur a noticeable
> performance penalty" with the above, as it should be clear that doing
> more work implies an impact on performance and that avoids the whole
> question of trying to characterize in a general way something that can't
> be generalized (as it's workload dependent).

-1. I think this is underplaying the cost.

Greetings,

Andres Freund



Re: Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-09 Thread Stephen Frost
Greetings,

* Andreas Joseph Krogh (andr...@visena.com) wrote:
> Aha, so enabling CRC causes hint-bits to be written causing extra 
> WAL-logging, 
> which woudn't be the case without CRC enabled?
> Thanks for pointing that out.

Yes, having checksums enabled forces logging of hint bits.  You can
enable wal_log_hints independently too, without having checksums, to see
what kind of an impact it'll have on your environment.

A useful documentation update might be:

---
With checksums enabled, wal_log_hints 
will be enabled and each page read or write will involve calculating the
checksum for the page.
---

I'd probably just replace the "Enabling checksums may incur a noticeable
performance penalty" with the above, as it should be clear that doing
more work implies an impact on performance and that avoids the whole
question of trying to characterize in a general way something that can't
be generalized (as it's workload dependent).

Thanks!

Stephen


signature.asc
Description: PGP signature


Sv: Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-09 Thread Andreas Joseph Krogh
På onsdag 10. januar 2018 kl. 01:01:26, skrev Andres Freund >:
On 2018-01-10 00:25:08 +0100, Andreas Joseph Krogh wrote:
 > På tirsdag 09. januar 2018 kl. 23:42:45, skrev Rob Sargent <
 > robjsarg...@gmail.com >:
 >  
 >
 >    On 01/09/2018 03:30 PM, Andreas Joseph Krogh wrote:
 > På tirsdag 09. januar 2018 kl. 23:06:06, skrev Andres Freund <
 > and...@anarazel.de >:
 > Hi,
 >
 >  On 2018-01-09 21:47:17 +0100, Andreas Joseph Krogh wrote:
 >  > Does PG use HW-accellerated crc if CPU supports it[1]?
 >
 >  Yes we do, for WAL checksums. The page checksums are a different
 >  algorithm though, one which has the advantage of being SIMD compatible.
 >
 >  The checksum computations have some impact, but if there's bigger impact
 >  it's much more likely to be related to the fact that some hint bit
 >  writes to a page now needs to be WAL logged.
 >  
 > But SIMD-instructions are also HW-accellerated by modern CPUs IIUC?

 Sure. Still measurable, but even if weren't, it's irrelevant given my
 primary point:

 >  The checksum computations have some impact, but if there's bigger impact
 >  it's much more likely to be related to the fact that some hint bit
 >  writes to a page now needs to be WAL logged.

 which isn't mitigated by SIMD / hardware CRC / whatnot.
 
Aha, so enabling CRC causes hint-bits to be written causing extra WAL-logging, 
which woudn't be the case without CRC enabled?
Thanks for pointing that out.
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: Sv: Re: Sv: Re: Sv: Re: data-checksums

2018-01-09 Thread Andres Freund
On 2018-01-10 00:25:08 +0100, Andreas Joseph Krogh wrote:
> På tirsdag 09. januar 2018 kl. 23:42:45, skrev Rob Sargent <
> robjsarg...@gmail.com >:
>  
> 
>    On 01/09/2018 03:30 PM, Andreas Joseph Krogh wrote:
> På tirsdag 09. januar 2018 kl. 23:06:06, skrev Andres Freund <
> and...@anarazel.de >:
> Hi,
> 
>  On 2018-01-09 21:47:17 +0100, Andreas Joseph Krogh wrote:
>  > Does PG use HW-accellerated crc if CPU supports it[1]?
> 
>  Yes we do, for WAL checksums. The page checksums are a different
>  algorithm though, one which has the advantage of being SIMD compatible.
> 
>  The checksum computations have some impact, but if there's bigger impact
>  it's much more likely to be related to the fact that some hint bit
>  writes to a page now needs to be WAL logged.
>  
> But SIMD-instructions are also HW-accellerated by modern CPUs IIUC?

Sure. Still measurable, but even if weren't, it's irrelevant given my
primary point:

>  The checksum computations have some impact, but if there's bigger impact
>  it's much more likely to be related to the fact that some hint bit
>  writes to a page now needs to be WAL logged.

which isn't mitigated by SIMD / hardware CRC / whatnot.

Greetings,

Andres Freund



Re: Sv: Re: Sv: Re: data-checksums

2018-01-09 Thread Rob Sargent



On 01/09/2018 03:30 PM, Andreas Joseph Krogh wrote:
På tirsdag 09. januar 2018 kl. 23:06:06, skrev Andres Freund 
>:


Hi,

On 2018-01-09 21:47:17 +0100, Andreas Joseph Krogh wrote:
> Does PG use HW-accellerated crc if CPU supports it[1]?

Yes we do, for WAL checksums. The page checksums are a different
algorithm though, one which has the advantage of being SIMD
compatible.

The checksum computations have some impact, but if there's bigger
impact
it's much more likely to be related to the fact that some hint bit
writes to a page now needs to be WAL logged.

But SIMD-instructions are also HW-accellerated by modern CPUs IIUC?
So, if these CRCs all are HW-accelerated the penalty chould be next to 
neglishable?



Leading directly back to JD's proposed documentation update.


Sv: Re: Sv: Re: data-checksums

2018-01-09 Thread Andreas Joseph Krogh
På tirsdag 09. januar 2018 kl. 23:06:06, skrev Andres Freund >:
Hi,

 On 2018-01-09 21:47:17 +0100, Andreas Joseph Krogh wrote:
 > Does PG use HW-accellerated crc if CPU supports it[1]?

 Yes we do, for WAL checksums. The page checksums are a different
 algorithm though, one which has the advantage of being SIMD compatible.

 The checksum computations have some impact, but if there's bigger impact
 it's much more likely to be related to the fact that some hint bit
 writes to a page now needs to be WAL logged.
 
But SIMD-instructions are also HW-accellerated by modern CPUs IIUC?
 
So, if these CRCs all are HW-accelerated the penalty chould be next to 
neglishable?
 
-- Andreas Joseph Krogh
CTO / Partner - Visena AS
Mobile: +47 909 56 963
andr...@visena.com 
www.visena.com 
 


 


Re: data-checksums

2018-01-09 Thread Joshua D. Drake

On 01/09/2018 12:22 PM, Andres Freund wrote:

On 2018-01-09 20:04:04 +0100, Rakesh Kumar wrote:

I also would like to believe that the hit is small, but when PG
official document writes "noticeable performance penalty", it becomes
difficult to convince management that the hit is small :-)

Why believe, when you can measure?

yup doing that.  But I still feel that PG documentation should stay
away from such scare mongering.  Or did the lawyers write that :)

So we should rather lie about it having a potential for performance
impact? Who'd be helped by that?


It isn't a lie, it depends on the workload and hardware. Adjusting the 
documentation to say something like the following probably isn't a bad idea:


The use of the data checksum feature may incur a performance penalty. 
However, this does depend on your particular workload and provisioned 
hardware. It is wise to test the feature based on your specific 
requirements.




JD

--
Command Prompt, Inc. || http://the.postgres.company/ || @cmdpromptinc

PostgreSQL centered full stack support, consulting and development.
Advocate: @amplifypostgres || Learn: https://postgresconf.org
* Unless otherwise stated, opinions are my own.   *




Re: data-checksums

2018-01-09 Thread Andres Freund
On 2018-01-09 20:04:04 +0100, Rakesh Kumar wrote:
> 
> > > I also would like to believe that the hit is small, but when PG
> > > official document writes "noticeable performance penalty", it becomes
> > > difficult to convince management that the hit is small :-)
> > 
> > Why believe, when you can measure?
> 
> yup doing that.  But I still feel that PG documentation should stay
> away from such scare mongering.  Or did the lawyers write that :)

So we should rather lie about it having a potential for performance
impact? Who'd be helped by that?

Greetings,

Andres Freund



Re: data-checksums

2018-01-09 Thread George Neuner
"On Tue, 9 Jan 2018 20:02:37 +0100, "Rakesh Kumar"
 wrote:

>Hello Mr. Pedantic,
>
>> noticeable != huge.
>
>and noticeable != small/negligible either, at least from English
>point of view.

small != negligible. 

The word "noticable" does not imply any particular magnitude of event.
It means only that  can be observed.

There is no canon technical usage.  In layman's use, "noticable" often
*does* imply that an effect is small, and that one might not see it if
not looking specifically for it.


Unfortunately, English is a slippery language.  Perhaps technical
documents should be written in Sumerian.

YMMV,
George




Re: data-checksums

2018-01-09 Thread Rakesh Kumar

> > I also would like to believe that the hit is small, but when PG
> > official document writes "noticeable performance penalty", it becomes
> > difficult to convince management that the hit is small :-)
> 
> Why believe, when you can measure?

yup doing that.  But I still feel that PG documentation should stay away from 
such scare mongering.  Or did the lawyers write that :)



Re: data-checksums

2018-01-09 Thread Alvaro Herrera
Rakesh Kumar wrote:
> 
> > That said, imv anyway, the performance hit is small and having
> > checksums is well worth it.
> 
> I also would like to believe that the hit is small, but when PG
> official document writes "noticeable performance penalty", it becomes
> difficult to convince management that the hit is small :-)

Why believe, when you can measure?

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: data-checksums

2018-01-09 Thread Andres Freund
On 2018-01-09 18:58:41 +0100, Rakesh Kumar wrote:
> 
> > That said, imv anyway, the performance hit is small and having checksums
> > is well worth it.
> 
> I also would like to believe that the hit is small, but when PG
> official document writes "noticeable performance penalty", it becomes
> difficult to convince management that the hit is small :-)

noticeable != huge.

- Andres



Re: data-checksums

2018-01-09 Thread Rakesh Kumar

> That said, imv anyway, the performance hit is small and having checksums
> is well worth it.

I also would like to believe that the hit is small, but when PG official 
document writes "noticeable performance penalty", it becomes difficult to 
convince management that the hit is small :-)



Re: data-checksums

2018-01-09 Thread Stephen Frost
Greetings,

* Rakesh Kumar (rakeshkumar...@mail.com) wrote:
> --data-checksums
> Use checksums on data pages to help detect corruption by the I/O system that 
> would otherwise be silent. Enabling checksums may incur a noticeable 
> performance penalty. This option can only be set during initialization, and 
> cannot be changed later. If set, checksums are calculated for all objects, in 
> all databases.
> 
> 
> If I understand it correctly, the performance penalty is when the blocks are 
> written to the disk by the background writer and/or during checkpoint.  Given 
> that the process is async and application does not wait on it, is the 
> performance penalty really a big concern.

There's also a hit when pages are read back in, since we need to
calculate the checksum and verify it hasn't changed.

That said, imv anyway, the performance hit is small and having checksums
is well worth it.

Thanks!

Stephen


signature.asc
Description: PGP signature


data-checksums

2018-01-09 Thread Rakesh Kumar
--data-checksums
Use checksums on data pages to help detect corruption by the I/O system that 
would otherwise be silent. Enabling checksums may incur a noticeable 
performance penalty. This option can only be set during initialization, and 
cannot be changed later. If set, checksums are calculated for all objects, in 
all databases.


If I understand it correctly, the performance penalty is when the blocks are 
written to the disk by the background writer and/or during checkpoint.  Given 
that the process is async and application does not wait on it, is the 
performance penalty really a big concern.

thanks