subject:"Re\: RFC\: Add 'taint' field to pg

Re: RFC: Add 'taint' field to pg_control.

2020-03-23 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 04:16:53PM -0600, Justin Pryzby wrote:
> On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> > a significant number of times during investigations of bugs I wondered
> > whether running the cluster with various settings, or various tools
> > could've caused the issue at hand.  Therefore I'd like to propose adding
> > a 'tainted' field to pg_control, that contains some of the "history" of
> > the cluster. Individual bits inside that field that I can think of right
> > now are:
> > - pg_resetxlog was used non-passively on cluster
> > - ran with fsync=off
> > - ran with full_page_writes=off
> > - pg_upgrade was used

+ cluster was downgraded (between minor releases)

Dregging up a years old thread since I thought that was worth adding..

-- 
Justin

Re: RFC: Add 'taint' field to pg_control.

2018-03-08 Thread Andres Freund

On 2018-03-07 23:34:37 -0500, Tom Lane wrote:
> Craig Ringer  writes:
> > As I understand it, because we allow multiple Pg instances on a system, we
> > identify the small sysv shmem segment we use by the postmaster's pid. If
> > you remove the DirLockFile (postmaster.pid) you remove the interlock
> > against starting a new postmaster. It'll think it's a new independent
> > instance on the same host, make a new shmem segment and go merrily on its
> > way mangling data horribly.
> 
> Yeah.  If we realized that the old shmem segment was associated with this
> data directory, we could check for processes still attached to it ... but
> the lock file is exactly where that association is kept.

I'd somehow remembered that we just took the path as the identifier, but
that's wrong...

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Craig Ringer

On 8 March 2018 at 12:34, Tom Lane  wrote:

> Craig Ringer  writes:
> > As I understand it, because we allow multiple Pg instances on a system,
> we
> > identify the small sysv shmem segment we use by the postmaster's pid. If
> > you remove the DirLockFile (postmaster.pid) you remove the interlock
> > against starting a new postmaster. It'll think it's a new independent
> > instance on the same host, make a new shmem segment and go merrily on its
> > way mangling data horribly.
>
> Yeah.  If we realized that the old shmem segment was associated with this
> data directory, we could check for processes still attached to it ... but
> the lock file is exactly where that association is kept.
>
> > It'd be nice if the OS offered us some support here. Something like
> opening
> > a lockfile in exclusive lock mode, then inheriting the FD and lock on all
> > children, with each child inheriting the lock. So the exclusive lock
> > wouldn't get released until all FDs associated with it are released. But
> > AFAIK nothing like that is present, let alone portable.
>
> I've wondered about that too.  An inheritable lock on the data directory
> itself would be ideal, but that doesn't exist anywhere AFAIK.  We could
> imagine taking a BSD-style-flock(2) lock on some lock file, on systems that
> have that; but not all do, so it couldn't be our only protection.  Another
> problem is that since it'd have to be an ordinary file, there'd still be a
> risk of cowboy DBAs removing the lock file.  Maybe we could use pg_control
> as the lock file?
>

"The error mentioned that there was another active database using this
pg_control file, so I deleted it, and now my database won't start."

Like our discussion about pg_resetxlog, there's a limit to how much
operator error we can guard against. More importantly we'd have to be
_very_ sure it was correct or our attempts to protect the user would risk
creating a major outage.

That said, flock(2) with LOCK_NB seems to have just the semantics we want:

"
   A single file may not simultaneously have both shared and exclusive
locks.

   Locks  created by flock() are associated with an open file
description (see open(2)).  This means that duplicate file descriptors
(created by, for example, fork(2) or dup(2)) refer to the same lock,
   and this lock may be modified or released using any of these file
descriptors.  Furthermore, the lock is released either by an explicit
LOCK_UN operation on any of these duplicate file  descriptors,
   or when all such file descriptors have been closed.
"

Worth looking into, but really important not to make a mistake and make
things worse.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Tom Lane

Craig Ringer  writes:
> As I understand it, because we allow multiple Pg instances on a system, we
> identify the small sysv shmem segment we use by the postmaster's pid. If
> you remove the DirLockFile (postmaster.pid) you remove the interlock
> against starting a new postmaster. It'll think it's a new independent
> instance on the same host, make a new shmem segment and go merrily on its
> way mangling data horribly.

Yeah.  If we realized that the old shmem segment was associated with this
data directory, we could check for processes still attached to it ... but
the lock file is exactly where that association is kept.

> It'd be nice if the OS offered us some support here. Something like opening
> a lockfile in exclusive lock mode, then inheriting the FD and lock on all
> children, with each child inheriting the lock. So the exclusive lock
> wouldn't get released until all FDs associated with it are released. But
> AFAIK nothing like that is present, let alone portable.

I've wondered about that too.  An inheritable lock on the data directory
itself would be ideal, but that doesn't exist anywhere AFAIK.  We could
imagine taking a BSD-style-flock(2) lock on some lock file, on systems that
have that; but not all do, so it couldn't be our only protection.  Another
problem is that since it'd have to be an ordinary file, there'd still be a
risk of cowboy DBAs removing the lock file.  Maybe we could use pg_control
as the lock file?

regards, tom lane

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Craig Ringer

On 8 March 2018 at 10:18, Andres Freund  wrote:

>
>
> On March 7, 2018 5:51:29 PM PST, Craig Ringer 
> wrote:
> >My favourite remains an organisation that kept "fixing" an issue by
> >kill
> >-9'ing the postmaster and removing postmaster.pid to make it start up
> >again. Without killing all the leftover backends. Of course, the system
> >kept getting more unstable and broken, so they did it more and more
> >often.
> >They were working on scripting it when they gave up and asked for help.
>
> Maybe I'm missing something, but that ought to not work. The shmem segment
> that we keep around would be a conflict, no?
>
>
As I understand it, because we allow multiple Pg instances on a system, we
identify the small sysv shmem segment we use by the postmaster's pid. If
you remove the DirLockFile (postmaster.pid) you remove the interlock
against starting a new postmaster. It'll think it's a new independent
instance on the same host, make a new shmem segment and go merrily on its
way mangling data horribly.

See CreateLockFile(). Also 7e2a18a9161 . In
particular src/backend/utils/init/miscinit.c +938,

if (isDDLock)
{
  
if (PGSharedMemoryIsInUse(id1, id2))
ereport(FATAL,
(errcode(ERRCODE_LOCK_FILE_EXISTS),
 errmsg("pre-existing shared memory block "
"(key %lu, ID %lu) is still in use",
id1, id2),
 errhint("If you're sure there are no old "
 "server processes still running,
remove "
 "the shared memory block "
 "or just delete the file \"%s\".",
 filename)));
   
}

I still think that error is a bit optimistic, and should really say "make
very sure there are no 'postgres' processes associated with this data
directory, then ...'


It'd be nice if the OS offered us some support here. Something like opening
a lockfile in exclusive lock mode, then inheriting the FD and lock on all
children, with each child inheriting the lock. So the exclusive lock
wouldn't get released until all FDs associated with it are released. But
AFAIK nothing like that is present, let alone portable.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Andres Freund



On March 7, 2018 5:51:29 PM PST, Craig Ringer  wrote:
>My favourite remains an organisation that kept "fixing" an issue by
>kill
>-9'ing the postmaster and removing postmaster.pid to make it start up
>again. Without killing all the leftover backends. Of course, the system
>kept getting more unstable and broken, so they did it more and more
>often.
>They were working on scripting it when they gave up and asked for help.

Maybe I'm missing something, but that ought to not work. The shmem segment that 
we keep around would be a conflict, no?

Andres
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Craig Ringer

On 8 March 2018 at 04:58, Robert Haas  wrote:

> On Wed, Feb 28, 2018 at 8:03 PM, Craig Ringer 
> wrote:
> > A huge +1 from me for the idea. I can't even count the number of black
> box
> > "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
> > turned out to be down to the user doing something very silly and not
> saying
> > anything about it.
>
> +1 from me, too.
>
>
My favourite remains an organisation that kept "fixing" an issue by kill
-9'ing the postmaster and removing postmaster.pid to make it start up
again. Without killing all the leftover backends. Of course, the system
kept getting more unstable and broken, so they did it more and more often.
They were working on scripting it when they gave up and asked for help.

The data recovery effort on that one was truly exciting. I remember looking
at bash history and having to take a short break to figure out how on earth
to communicate what was going on.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-03-07 Thread Robert Haas

On Wed, Feb 28, 2018 at 8:03 PM, Craig Ringer  wrote:
> A huge +1 from me for the idea. I can't even count the number of black box
> "WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
> turned out to be down to the user doing something very silly and not saying
> anything about it.

+1 from me, too.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: RFC: Add 'taint' field to pg_control.

2018-03-01 Thread Justin Pryzby

On Thu, Mar 01, 2018 at 09:12:18AM +0800, Craig Ringer wrote:
> On 1 March 2018 at 06:28, Justin Pryzby  wrote:
> > The more fine grained these are the more useful they can be:
> >
> > Running with fsync=off is common advice while loading, so reporting that
> > "fsync=off at some point" is much less useful than reporting "having to run
> > recovery from an fsync=off database".
> >
> > I propose there could be two bits:
> >
> >  1. fsync was off at some point in history when the DB needed recovery;
> >  2. fsync was off during the current instance; I assume the bit would be set
> > whenever fsync=off was set, but could be cleared when the server was
> > cleanly shut down.  If the bit is set during recovery, the first bit 
> > would
> > be permanently set.  Not sure but if this bit is set and then fsync
> > re-enabled, maybe this could be cleared after any checkpoint and not 
> > just
> > shutdown ?
> >
> I think that's a very useful way to eliminate false positives and make this
> diagnostic field more useful. But we'd need to guarantee that when you've
> switched from fsync=off to fsync=on, we've actually fsync'd every extent of
> every table and index, whether or not we think they're currently dirty and
> whether or not the smgr currently has them open. Do something like initdb
> does to flush everything. Without that, we could have WAL-vs-heap ordering
> issues and so on *even after a Pg restart* if the system has lots of dirty
> writeback to do.


I think a similar, "2 bit" scheme *could* be applied to full_page_writes, too.
And the 2nd "soft" bit for FPW could also be cleared after checkpoint.  I'd
think that for FPW, previous checkpoints which had fsync=off wouldn't need
special handling - so long as FPW=on, the "soft" bit can be cleared after
successful checkpoint (as a micro-optimization, don't clear the "soft" bit if
the "hard" bit is set).

BTW this scheme has the amusing consequence that setting fsync=off should
involve first writing+fsyncing these bits to pgcontrol.

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 09:00, Justin Pryzby  wrote:

>
> > >  - started in single user mode or with system indices disabled?
> > why?
>
> Some of these I suggested just as a datapoint (or other brainstorms I
> couldn't
> immediately reject).  A cluster where someone has UPDATED pg_* (even
> pg_statistic) or otherwise hacked on I would tend to think about
> differently
> than a "pristine" cluster that's never seen anything more interesting than
> a
> join.

How about "has run in a state where system catalog modifications are
permitted". Because I've had one seriously frustrating case where that
would've been extremely pertinent information.

That said, I'm not convinced it's worth the effort and I think this is
going off into the weeds a little. Lets focus on Andres's core suggestion
(and maybe refine the fsync part to get rid of false positives due to bulk
loading) and build on from there in subsequent work.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 06:28, Justin Pryzby  wrote:

> On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> > On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > > I can see why you'd want that, but as a DBA, I don't necessarily want
> > > all of that recorded, especially in a quasi-permanent way.
> >
> > Huh?  You're arguing that we should make it easier for DBAs to hide
> > potential causes of corruption? I fail to see when that's necessary /
> > desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> > nor does it contain private data...
>
> The more fine grained these are the more useful they can be:
>
> Running with fsync=off is common advice while loading, so reporting that
> "fsync=off at some point" is much less useful than reporting "having to run
> recovery from an fsync=off database".
>
> I propose there could be two bits:
>
>  1. fsync was off at some point in history when the DB needed recovery;
>  2. fsync was off during the current instance; I assume the bit would be
> set
> whenever fsync=off was set, but could be cleared when the server was
> cleanly shut down.  If the bit is set during recovery, the first bit
> would
> be permanently set.  Not sure but if this bit is set and then fsync
> re-enabled, maybe this could be cleared after any checkpoint and not
> just
> shutdown ?
>
>
I think that's a very useful way to eliminate false positives and make this
diagnostic field more useful. But we'd need to guarantee that when you've
switched from fsync=off to fsync=on, we've actually fsync'd every extent of
every table and index, whether or not we think they're currently dirty and
whether or not the smgr currently has them open. Do something like initdb
does to flush everything. Without that, we could have WAL-vs-heap ordering
issues and so on *even after a Pg restart* if the system has lots of dirty
writeback to do.

A quick look at CheckPointGuts and CheckPointBuffers suggests we don't
presently do that but I admit I haven't read in depth.

We force a fsync of the current WAL segment when switching fsync on, but
don't do anything else AFAICS.

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Craig Ringer

On 1 March 2018 at 05:43, Andres Freund  wrote:

> Hi,
>
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used
>
> What do others think?
>
>
A huge +1 from me for the idea. I can't even count the number of black box
"WTF did you DO?!?" servers I've looked at, where bizarre behaviour has
turned out to be down to the user doing something very silly and not saying
anything about it.

It's only some flags, so putting it in pg_control is arguably somewhat
wasteful but so minor as to be of no real concern. And that's probably the
best way to make sure it follows the cluster around no matter what
backup/restore/copy mechanisms are used and how "clever" they try to be.

What I'd _really_ love would be to blow the scope of this up a bit and turn
it into a key-events cluster journal, recording key param switches,
recoveries (and lsn ranges), pg_upgrade's, etc. But then we'd run into
people with weird workloads who managed to make it some massive file, we'd
have to make sure we had a way to stop it getting left out of
copies/backups, and it'd generally be irritating. So lets not do that.
Proper support for class-based logging and multiple outputs would be a good
solution for this at some future point.

What you propose is simple enough to be quick to implement, adds no admin
overhead, and will be plenty useful enough.

I'd like to add "postmaster.pid was absent when the cluster started" to
this list, please. Sure, it's not conclusive, and there are legit reasons
why that might be the case, but so often it's somebody kill -9'ing the
postmaster, then removing the postmaster.pid and starting up again without
killing the workers

-- 
 Craig Ringer   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 02:23:19PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote:

> >  - did recovery (you could use "needed recovery" instead, but then there's 
> > the
> >question of how reliable that field would be);
> >+ or: timestamp of most recent recovery (attempt?)
> What'd that be useful for?

Theoretically nothing but conceivably useful if there's an issue with recovery.
I recall various historic things weren't but should have been WAL logged.

> >  - local_preload_libraries?
> Hm?

Not sure; but in any case I meant *_preload_libraries.

> >  - started in single user mode or with system indices disabled?
> why?

Some of these I suggested just as a datapoint (or other brainstorms I couldn't
immediately reject).  A cluster where someone has UPDATED pg_* (even
pg_statistic) or otherwise hacked on I would tend to think about differently
than a "pristine" cluster that's never seen anything more interesting than a
join.

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 02:18:12PM -0800, Andres Freund wrote:
> On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> > I can see why you'd want that, but as a DBA, I don't necessarily want
> > all of that recorded, especially in a quasi-permanent way.
> 
> Huh?  You're arguing that we should make it easier for DBAs to hide
> potential causes of corruption? I fail to see when that's necessary /
> desirable? That a cluster ran with fsync=off isn't an embarassing thing,
> nor does it contain private data...

The more fine grained these are the more useful they can be:

Running with fsync=off is common advice while loading, so reporting that
"fsync=off at some point" is much less useful than reporting "having to run
recovery from an fsync=off database".

I propose there could be two bits:

 1. fsync was off at some point in history when the DB needed recovery;
 2. fsync was off during the current instance; I assume the bit would be set
whenever fsync=off was set, but could be cleared when the server was
cleanly shut down.  If the bit is set during recovery, the first bit would
be permanently set.  Not sure but if this bit is set and then fsync
re-enabled, maybe this could be cleared after any checkpoint and not just
shutdown ?

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

Hi,

On 2018-02-28 16:16:53 -0600, Justin Pryzby wrote:

Unfortunately your list seems to raise the bar to a place I don't see us
going soon :(


>  - pg_control versions used on this cluster (hopefully a full list..obviously
>not going back before PG11);

That needs arbitrary much space, that's not feasible for
pg_control. Needs to be <= 512 bytes


>  - did recovery (you could use "needed recovery" instead, but then there's the
>question of how reliable that field would be);
>+ or: timestamp of most recent recovery (attempt?)

What'd that be useful for?


>  - index corruption detected (on system table?);  Note that "CREATE IF NOT
>EXIST" doesn't avoid unique key errors on system tables, so it's not useful
>to log every ERROR on system tables.

I don't think we have a easy way to diagnose this specifically enough


>  - page %u is uninitialized --- fixing

Doesn't really indicate a bug / problem.


>  - here's one I just dug up: ERROR: missing chunk number 0 for toast value 
> 10270298 in pg_toast_2619 

Hm.

>  - XID wraparound?

why?


>  - autovacuum disabled?

why?


>  - checksum failue (in system relation?)

Hm.


>  - local_preload_libraries?

Hm?


>  - started in single user mode or with system indices disabled?

why?


>  - hit assertion or PANIC ??

We certainly don't write to persistent data in those case.


>  - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
>vacuum/analyze/cluster/reindex?)

Hm, the former I could see some use for, but it might end up being
pretty ugly locking and layering wise.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

Hi,

On 2018-02-28 17:14:18 -0500, Peter Eisentraut wrote:
> I can see why you'd want that, but as a DBA, I don't necessarily want
> all of that recorded, especially in a quasi-permanent way.

Huh?  You're arguing that we should make it easier for DBAs to hide
potential causes of corruption? I fail to see when that's necessary /
desirable? That a cluster ran with fsync=off isn't an embarassing thing,
nor does it contain private data...

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Justin Pryzby

On Wed, Feb 28, 2018 at 01:43:11PM -0800, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

What about:

 - pg_control versions used on this cluster (hopefully a full list..obviously
   not going back before PG11);
 - did recovery (you could use "needed recovery" instead, but then there's the
   question of how reliable that field would be);
   + or: timestamp of most recent recovery (attempt?)
 - index corruption detected (on system table?);  Note that "CREATE IF NOT
   EXIST" doesn't avoid unique key errors on system tables, so it's not useful
   to log every ERROR on system tables.
 - page %u is uninitialized --- fixing
 - here's one I just dug up: ERROR: missing chunk number 0 for toast value 
10270298 in pg_toast_2619 
 - XID wraparound?
 - autovacuum disabled?
 - checksum failue (in system relation?)
 - local_preload_libraries?
 - started in single user mode or with system indices disabled?
 - hit assertion or PANIC ??
 - UPDATE/DELETE/INSERT on system table ? (or maintenance command like
   vacuum/analyze/cluster/reindex?)

Justin

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Andres Freund

On 2018-02-28 23:13:44 +0100, Tomas Vondra wrote:
> 
> On 02/28/2018 10:43 PM, Andres Freund wrote:
> > Hi,
> > 
> > a significant number of times during investigations of bugs I wondered
> > whether running the cluster with various settings, or various tools
> > could've caused the issue at hand.  Therefore I'd like to propose adding
> > a 'tainted' field to pg_control, that contains some of the "history" of
> > the cluster. Individual bits inside that field that I can think of right
> > now are:
> > - pg_resetxlog was used non-passively on cluster
> > - ran with fsync=off
> > - ran with full_page_writes=off
> > - pg_upgrade was used
> > 
> > What do others think?

> Isn't the uncertainty usually that you know one of these things happened
> on the cluster, but you don't know if that's the actual root cause?
> That's my experience, at least.

My experience is that you get a bugreport and you have no clue what
happened with the cluster. Often the "owner" doesn't have either. Then
you go through various theories and end up blaming it on one of these,
even though it's quite possibly never happened.

Of course this does *not* solve the issue when these actually
occurred. It's just a tool to narrow down possible causes / exclude
others.

Greetings,

Andres Freund

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Peter Eisentraut

On 2/28/18 16:43, Andres Freund wrote:
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used

I can see why you'd want that, but as a DBA, I don't necessarily want
all of that recorded, especially in a quasi-permanent way.

Maybe this could be stored in a separate location, like a small
text/log-like file in the data directory.  Then DBAs can save or keep or
remove that information as they wish.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RFC: Add 'taint' field to pg_control.

2018-02-28 Thread Tomas Vondra


On 02/28/2018 10:43 PM, Andres Freund wrote:
> Hi,
> 
> a significant number of times during investigations of bugs I wondered
> whether running the cluster with various settings, or various tools
> could've caused the issue at hand.  Therefore I'd like to propose adding
> a 'tainted' field to pg_control, that contains some of the "history" of
> the cluster. Individual bits inside that field that I can think of right
> now are:
> - pg_resetxlog was used non-passively on cluster
> - ran with fsync=off
> - ran with full_page_writes=off
> - pg_upgrade was used
> 
> What do others think?
> 

Isn't the uncertainty usually that you know one of these things happened
on the cluster, but you don't know if that's the actual root cause?
That's my experience, at least.

That being said, I'm not strongly opposed to adding such field.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

Re: RFC: Add 'taint' field to pg_control.

20 matches

Site Navigation

Mail list logo

Footer information