Re: [HACKERS] First-draft release notes for next week's releases

2014-03-19 Thread Josh Berkus
All,

So, I'll ask again (because I didn't see a reply): is there any way
users can *check* if they've been corrupted?  Short of waiting for PK/FK
violations?

Given that all of the fixes we recommend involve extensive downtimes,
nobody is going to want to do them just in case.  How can they test
for the issue?  I can't see any reasonable way ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-19 Thread Tom Lane
Josh Berkus j...@agliodbs.com writes:
 So, I'll ask again (because I didn't see a reply): is there any way
 users can *check* if they've been corrupted?  Short of waiting for PK/FK
 violations?

I think it would work to do a REINDEX on each table (doesn't matter which
index you select) and see if it complains about not being able to find any
parent tuples.  This would definitely catch the case as long as the
orphaned HOT tuple is still live.  If it's not, the immediate problem is
gone ... but it's still possible you have follow-on corruption.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-19 Thread Alvaro Herrera
Josh Berkus wrote:
 All,
 
 So, I'll ask again (because I didn't see a reply): is there any way
 users can *check* if they've been corrupted?  Short of waiting for PK/FK
 violations?

Obviously there are queries you can run to check each FK -- the same
queries that ri_triggers.c would run when you create an FK.  It's
cumbersome to write, but not impossible.  In fact, it can be done
mechanically.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-19 Thread Josh Berkus
On 03/19/2014 02:01 PM, Alvaro Herrera wrote:
 Josh Berkus wrote:
 All,

 So, I'll ask again (because I didn't see a reply): is there any way
 users can *check* if they've been corrupted?  Short of waiting for PK/FK
 violations?
 
 Obviously there are queries you can run to check each FK -- the same
 queries that ri_triggers.c would run when you create an FK.  It's
 cumbersome to write, but not impossible.  In fact, it can be done
 mechanically.

Would users which this corruption necessarily have broken FKs which
would show up as such on a simple query?  That is, if I did:

SELECT ref_id FROM referenced WHERE ref_id NOT IN ( SELECT ref_id FROM
referencing )

... or something similar, would that show the issue?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-19 Thread Alvaro Herrera
Josh Berkus wrote:
 On 03/19/2014 02:01 PM, Alvaro Herrera wrote:
  Josh Berkus wrote:
  All,
 
  So, I'll ask again (because I didn't see a reply): is there any way
  users can *check* if they've been corrupted?  Short of waiting for PK/FK
  violations?

Some notes:

1. if there's been no crash with 9.3 installed in a single system, or in
a master system, corruption cannot have occured.

2. replicas are very likely to have gotten corrupted if referenced
tables are updated at all.  Many workloads do not update referenced
tables; those are not at risk.

3. Master that are failed-over at-risk replicas are thus very likely to
have been corrupted.

  Obviously there are queries you can run to check each FK -- the same
  queries that ri_triggers.c would run when you create an FK.  It's
  cumbersome to write, but not impossible.  In fact, it can be done
  mechanically.
 
 Would users which this corruption necessarily have broken FKs which
 would show up as such on a simple query?  That is, if I did:
 
 SELECT ref_id FROM referenced WHERE ref_id NOT IN ( SELECT ref_id FROM
 referencing )
 
 ... or something similar, would that show the issue?

Yes, AFAICT that would show the issue, as long as the query uses an
index.  I assume, without checking, that setting enable_seqscan to OFF
would have that effect on most but the largest tables.  I think it'd be
better to write that as an EXISTS query, though.  You also need to
consider details such as the MATCH mode of the FK, for multicolumn ones.

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-18 Thread Josh Berkus
Folks:

So another question, which I've already received from the field, is how
can you detect this kind of corruption in the first place, if it's not
causing a user-visible error?

Got that question from someone who failed over between master and
replica on 9.3.2 last weekend.  They're not seeing PK/FK errors, but
that doesn't mean they're clean.  How can they tell?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Greg Stark st...@mit.edu writes:
 This is not really accurate:
 This error allowed multiple versions of the same row to become
 visible to queries, resulting in apparent duplicates. Since the error
 is in WAL replay, it would only manifest during crash recovery or on
 standby servers.

 I think the idea is coming from what the second sentence below is
 getting at but it may be too complex to explain in a release note:

 The error causes some rows to disappear from indexes resulting in
 inconsistent query results on a hot standby depending on whether
 indexes are used. If the standby is subsequently activated or if it
 occurs during recovery after a crash or backup restore it could result
 in unique constraint violations as well.

Hm ... rows disappearing from indexes might make people think that
they could fix or mitigate the damage via REINDEX.  That's not really
true though is it?  It looks to me like IndexBuildHeapScan will suffer
an Assert failure in an assert-enabled build, or build a bogus index
if not assert-enabled, when it comes across a heap-only tuple that
has no parent.

I'm thinking we'd better promote that Assert to a normal runtime elog.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Josh Berkus
On 03/17/2014 08:28 AM, Tom Lane wrote:
 Greg Stark st...@mit.edu writes:
 The error causes some rows to disappear from indexes resulting in
 inconsistent query results on a hot standby depending on whether
 indexes are used. If the standby is subsequently activated or if it
 occurs during recovery after a crash or backup restore it could result
 in unique constraint violations as well.
 
 Hm ... rows disappearing from indexes might make people think that
 they could fix or mitigate the damage via REINDEX.  That's not really
 true though is it?  It looks to me like IndexBuildHeapScan will suffer
 an Assert failure in an assert-enabled build, or build a bogus index
 if not assert-enabled, when it comes across a heap-only tuple that
 has no parent.

First, see suggested text in my first-draft release announcement.

Second, if a user has encountered this kind of data corruption on their
master (due to crash recovery), how exactly *do* they fix it?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
 On 03/17/2014 08:28 AM, Tom Lane wrote:
  Greg Stark st...@mit.edu writes:
  The error causes some rows to disappear from indexes resulting in
  inconsistent query results on a hot standby depending on whether
  indexes are used. If the standby is subsequently activated or if it
  occurs during recovery after a crash or backup restore it could result
  in unique constraint violations as well.
  
  Hm ... rows disappearing from indexes might make people think that
  they could fix or mitigate the damage via REINDEX.  That's not really
  true though is it?  It looks to me like IndexBuildHeapScan will suffer
  an Assert failure in an assert-enabled build, or build a bogus index
  if not assert-enabled, when it comes across a heap-only tuple that
  has no parent.
 
 First, see suggested text in my first-draft release announcement.

I don't think that text is any better, it's imo even wrong:
The bug causes rows to vanish from indexes during recovery due to
simultaneous updates of rows on both sides of a foreign key.

Neither is a foreign key, nor simultaneous updates, nor both sides a
prerequisite.

 Second, if a user has encountered this kind of data corruption on their
 master (due to crash recovery), how exactly *do* they fix it?

Dump/restore is the most obvious candidate. The next best thing I can
think of is a noop rewriting ALTER TABLE, that doesn't deal with ctid
chains IIRC, in contrast to CLUSTER/VACUUM FULL.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-15 16:02:19 -0400, Tom Lane wrote:
 First-draft release notes are committed, and should be visible at
 http://www.postgresql.org/docs/devel/static/release-9-3-4.html
 once guaibasaurus does its next buildfarm run a few minutes from
 now.  Any suggestions?

So, the current text is:
This error allowed multiple versions of the same row to become visible
to queries, resulting in apparent duplicates. Since the error is in WAL
replay, it would only manifest during crash recovery or on standby
servers.

what about:

The most prominent consequence of this bug is that rows can appear to
not exist when accessed via an index, while still being visible in
sequential scans. This in turn can lead to constraints, including unique
and foreign key ones, to be violated lateron.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
 First, see suggested text in my first-draft release announcement.

 I don't think that text is any better, it's imo even wrong:
 The bug causes rows to vanish from indexes during recovery due to
 simultaneous updates of rows on both sides of a foreign key.

 Neither is a foreign key, nor simultaneous updates, nor both sides a
 prerequisite.

What I've got at the moment is

  This error caused updated rows to disappear from indexes, resulting
  in inconsistent query results depending on whether an index scan was
  used.  Subsequent processing could result in unique-key violations,
  since the previously updated row would not be found by later index
  searches.  Since this error is in WAL replay, it would only manifest
  during crash recovery or on standby servers.  The improperly-replayed
  case can arise when a table row that is referenced by a foreign-key
  constraint is updated concurrently with creation of a
  referencing row.

OK, or not?  The time window for bikeshedding this is dwindling rapidly.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 13:42:59 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 10:03:52 -0700, Josh Berkus wrote:
  First, see suggested text in my first-draft release announcement.
 
  I don't think that text is any better, it's imo even wrong:
  The bug causes rows to vanish from indexes during recovery due to
  simultaneous updates of rows on both sides of a foreign key.
 
  Neither is a foreign key, nor simultaneous updates, nor both sides a
  prerequisite.
 
 What I've got at the moment is
 
   This error caused updated rows to disappear from indexes, resulting
   in inconsistent query results depending on whether an index scan was
   used.  Subsequent processing could result in unique-key violations,
   since the previously updated row would not be found by later index
   searches.  Since this error is in WAL replay, it would only manifest
   during crash recovery or on standby servers.  The improperly-replayed
   case can arise when a table row that is referenced by a foreign-key
   constraint is updated concurrently with creation of a
   referencing row.
 
 OK, or not?  The time window for bikeshedding this is dwindling rapidly.

That's much better, yes. Two things:

* I'd change the warning about unique key violations into a more general
  one about constraints. Foreign key and exclusion constraint are also
  affected...
* I wonder if we should make the possible origins a bit more
  general as it's perfectly possible to trigger the problem without
  foreign keys. Maybe: can arise when a table row that has been updated
  is row locked; that can e.g. happen when foreign keys are used.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 11:28:45 -0400, Tom Lane wrote:
 Hm ... rows disappearing from indexes might make people think that
 they could fix or mitigate the damage via REINDEX.

Good point. I guess in some cases it will end up working because
VACUUM/hot pruning have cleaned up the mess, but that's certainly not
something I'd want to rely upon. They very well could have messed up
things when presented with bogus input data.

  That's not really
 true though is it?  It looks to me like IndexBuildHeapScan will suffer
 an Assert failure in an assert-enabled build, or build a bogus index
 if not assert-enabled, when it comes across a heap-only tuple that
 has no parent.
 
 I'm thinking we'd better promote that Assert to a normal runtime elog.

I wonder if we also should make rewriteheap.c warn about such
things. Although I don't immediately see a trivial way to do so.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 That's much better, yes. Two things:

 * I'd change the warning about unique key violations into a more general
   one about constraints. Foreign key and exclusion constraint are also
   affected...

I'll see what I can do.

 * I wonder if we should make the possible origins a bit more
   general as it's perfectly possible to trigger the problem without
   foreign keys. Maybe: can arise when a table row that has been updated
   is row locked; that can e.g. happen when foreign keys are used.

IIUC, this case only occurs when using the new-in-9.3 types of
nonexclusive row locks.  I'm willing to bet that the number of
applications using those is negligible; so I think it's all right to not
mention that case explicitly, as long as the wording doesn't say that
foreign keys are the *only* cause (which I didn't).

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  * I wonder if we should make the possible origins a bit more
general as it's perfectly possible to trigger the problem without
foreign keys. Maybe: can arise when a table row that has been updated
is row locked; that can e.g. happen when foreign keys are used.
 
 IIUC, this case only occurs when using the new-in-9.3 types of
 nonexclusive row locks.  I'm willing to bet that the number of
 applications using those is negligible; so I think it's all right to not
 mention that case explicitly, as long as the wording doesn't say that
 foreign keys are the *only* cause (which I didn't).

I actually think the issue could also occur with row locks of other
severities (is that the correct term?). Alvaro probably knows better,
but if I see correctly it's also triggerable if a backend waits for an
updating transaction to finish and follow_updates = true is passed to
heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
 IIUC, this case only occurs when using the new-in-9.3 types of
 nonexclusive row locks.  I'm willing to bet that the number of
 applications using those is negligible; so I think it's all right to not
 mention that case explicitly, as long as the wording doesn't say that
 foreign keys are the *only* cause (which I didn't).

 I actually think the issue could also occur with row locks of other
 severities (is that the correct term?).

The commit log entry says

We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
WAL replay of a tuple-lock operation, which is incorrect when the tuple
is already updated.

Back-patch to 9.3.  The clearing of both header elements was there
previously, but since no update could be present on a tuple that was
being locked, it was harmless.

which I read to mean that the case can't occur with the types of row
locks that were allowed pre-9.3.

 but if I see correctly it's also triggerable if a backend waits for an
 updating transaction to finish and follow_updates = true is passed to
 heap_lock_tuple(). Which e.g. nodeLockRows.c does...

That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
so it can't be subject to this.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 14:16:41 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
  IIUC, this case only occurs when using the new-in-9.3 types of
  nonexclusive row locks.  I'm willing to bet that the number of
  applications using those is negligible; so I think it's all right to not
  mention that case explicitly, as long as the wording doesn't say that
  foreign keys are the *only* cause (which I didn't).
 
  I actually think the issue could also occur with row locks of other
  severities (is that the correct term?).
 
 The commit log entry says
 
 We were resetting the tuple's HEAP_HOT_UPDATED flag as well as t_ctid on
 WAL replay of a tuple-lock operation, which is incorrect when the tuple
 is already updated.
 
 Back-patch to 9.3.  The clearing of both header elements was there
 previously, but since no update could be present on a tuple that was
 being locked, it was harmless.
 
 which I read to mean that the case can't occur with the types of row
 locks that were allowed pre-9.3.

That's not an unreasonable interpretation of the commit message, but I
don't think it's correct with respect to the code :(

  but if I see correctly it's also triggerable if a backend waits for an
  updating transaction to finish and follow_updates = true is passed to
  heap_lock_tuple(). Which e.g. nodeLockRows.c does...
 
 That sounds backwards.  nodeLockRows locks the latest tuple in the chain,
 so it can't be subject to this.

Hm, I don't see anything in the code preventing it, that's the
lock_tuple() before the EPQ stuff... in ExecLockRows():
foreach(lc, node-lr_arowMarks)
{
test = heap_lock_tuple(erm-relation, tuple,
   
estate-es_output_cid,
   lockmode, 
erm-noWait, true,
   buffer, hufd);
ReleaseBuffer(buffer);
switch (test)
{
case HeapTupleSelfUpdated:
...

the true passed to heap_lock_tuple() is the follow_updates
parameter. And then in heap_lock_tuple():
if (require_sleep)
{
if (infomask  HEAP_XMAX_IS_MULTI)
{
...
/* if there are updates, follow the update 
chain */
if (follow_updates 
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
{
HTSU_Result res;

res = heap_lock_updated_tuple(relation, 
tuple, t_ctid,

  GetCurrentTransactionId(),
...
else
{
/* wait for regular transaction to end */
if (nowait)
{
if 
(!ConditionalXactLockTableWait(xwait))
ereport(ERROR,

(errcode(ERRCODE_LOCK_NOT_AVAILABLE),
 errmsg(could 
not obtain lock on row in relation \%s\,

RelationGetRelationName(relation;
}
else
XactLockTableWait(xwait);

/* if there are updates, follow the update 
chain */
if (follow_updates 
!HEAP_XMAX_IS_LOCKED_ONLY(infomask))
...
if (RelationNeedsWAL(relation))
{
xl_heap_lock xlrec;
XLogRecPtr  recptr;
XLogRecData rdata[2];

xlrec.target.node = relation-rd_node;
xlrec.target.tid = tuple-t_self;
...

To me that looks sufficient to trigger the bug, because we're issuing a
wal record about the row that was passed to heap_lock_update(), not the
latest one in the ctid chain. When replaying that record, it will reset
the t_ctid field, thus breaking the chain.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 To me that looks sufficient to trigger the bug, because we're issuing a
 wal record about the row that was passed to heap_lock_update(), not the
 latest one in the ctid chain. When replaying that record, it will reset
 the t_ctid field, thus breaking the chain.

[ scratches head ... ]  If that's what's happening, isn't it a bug in
itself?  Surely the WAL record ought to point at the tuple that was
locked.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  To me that looks sufficient to trigger the bug, because we're issuing a
  wal record about the row that was passed to heap_lock_update(), not the
  latest one in the ctid chain. When replaying that record, it will reset
  the t_ctid field, thus breaking the chain.
 
 [ scratches head ... ]  If that's what's happening, isn't it a bug in
 itself?  Surely the WAL record ought to point at the tuple that was
 locked.

There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
version, emitted by heap_lock_updated_tuple_rec(). This really is mind
bendingly complex :(.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 14:52:25 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
  [ scratches head ... ]  If that's what's happening, isn't it a bug in
  itself?  Surely the WAL record ought to point at the tuple that was
  locked.
 
  There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
  version, emitted by heap_lock_updated_tuple_rec(). This really is mind
  bendingly complex :(.
 
 Ah, I see; so only the original tuple in the chain is at risk?

Depending on what you define the original tuple in the chain to
be. No, if you happen to mean the root tuple of a ctid chain or similar;
which I guess you didn't. Yes, if you mean the tuplepassed to
heap_lock_tuple(). heap_xlog_lock_updated() looks (and has looked)
correct.

 How about this:

Sounds good to me.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 14:29:56 -0400, Tom Lane wrote:
 [ scratches head ... ]  If that's what's happening, isn't it a bug in
 itself?  Surely the WAL record ought to point at the tuple that was
 locked.

 There's a separate XLOG_HEAP2_LOCK_UPDATED record, for every later tuple
 version, emitted by heap_lock_updated_tuple_rec(). This really is mind
 bendingly complex :(.

Ah, I see; so only the original tuple in the chain is at risk?

How about this:

  This error caused updated rows to not be found by index scans, resulting
  in inconsistent query results depending on whether an index scan was
  used.  Subsequent processing could result in constraint violations,
  since the previously updated row would not be found by later index
  searches, thus possibly allowing conflicting rows to be inserted.
  Since this error is in WAL replay, it would only manifest during crash
  recovery or on standby servers.  The improperly-replayed case most
  commonly arises when a table row that is referenced by a foreign-key
  constraint is updated concurrently with creation of a referencing row;
  but it can also occur when any variant of commandSELECT FOR UPDATE/
  is applied to a row that is being concurrently updated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
  Andres Freund and...@2ndquadrant.com writes:
   * I wonder if we should make the possible origins a bit more
 general as it's perfectly possible to trigger the problem without
 foreign keys. Maybe: can arise when a table row that has been updated
 is row locked; that can e.g. happen when foreign keys are used.
  
  IIUC, this case only occurs when using the new-in-9.3 types of
  nonexclusive row locks.  I'm willing to bet that the number of
  applications using those is negligible; so I think it's all right to not
  mention that case explicitly, as long as the wording doesn't say that
  foreign keys are the *only* cause (which I didn't).
 
 I actually think the issue could also occur with row locks of other
 severities (is that the correct term?). Alvaro probably knows better,
 but if I see correctly it's also triggerable if a backend waits for an
 updating transaction to finish and follow_updates = true is passed to
 heap_lock_tuple(). Which e.g. nodeLockRows.c does...

Uhm.  But at the bottom of that block, right above the failed: label
(heapam.c line 4527 in current master), we recheck the tuple for
locked-only-ness; and fail the whole operation by returning
HeapTupleUpdated, if it's not locked-only, no?  Which would cause
ExecLockRows to grab the next version via EvalPlanQualFetch.
Essentially that check is a lock-conflict test, and the only thing that
does not conflict with an update is a FOR KEY SHARE lock.

Note the only way to pass that test is that either the tuple is
locked-only (spelled in three different ways), or !require_sleep.

Am I completely misunderstanding what's being said here?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Uhm.  But at the bottom of that block, right above the failed: label
 (heapam.c line 4527 in current master), we recheck the tuple for
 locked-only-ness; and fail the whole operation by returning
 HeapTupleUpdated, if it's not locked-only, no?  Which would cause
 ExecLockRows to grab the next version via EvalPlanQualFetch.
 Essentially that check is a lock-conflict test, and the only thing that
 does not conflict with an update is a FOR KEY SHARE lock.

Right, the row-lock acquisition has to succeed for there to be a risk.
I wasn't sure whether 9.3 had introduced any such cases for existing
row lock types.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 16:17:35 -0300, Alvaro Herrera wrote:
 Andres Freund wrote:
  On 2014-03-17 14:01:03 -0400, Tom Lane wrote:
   Andres Freund and...@2ndquadrant.com writes:
* I wonder if we should make the possible origins a bit more
  general as it's perfectly possible to trigger the problem without
  foreign keys. Maybe: can arise when a table row that has been updated
  is row locked; that can e.g. happen when foreign keys are used.
   
   IIUC, this case only occurs when using the new-in-9.3 types of
   nonexclusive row locks.  I'm willing to bet that the number of
   applications using those is negligible; so I think it's all right to not
   mention that case explicitly, as long as the wording doesn't say that
   foreign keys are the *only* cause (which I didn't).
  
  I actually think the issue could also occur with row locks of other
  severities (is that the correct term?). Alvaro probably knows better,
  but if I see correctly it's also triggerable if a backend waits for an
  updating transaction to finish and follow_updates = true is passed to
  heap_lock_tuple(). Which e.g. nodeLockRows.c does...
 
 Uhm.  But at the bottom of that block, right above the failed: label
 (heapam.c line 4527 in current master), we recheck the tuple for
 locked-only-ness; and fail the whole operation by returning
 HeapTupleUpdated, if it's not locked-only, no?  Which would cause
 ExecLockRows to grab the next version via EvalPlanQualFetch.
 Essentially that check is a lock-conflict test, and the only thing that
 does not conflict with an update is a FOR KEY SHARE lock.

What I was thinking of is the case where heap_lock_tuple() notices it
needs to sleep and then in the if (require_sleep) block does a
lock_updated_tuple(). If the updating transaction aborts while waiting
lock_updated_tuple_rec() will issue a XLOG_HEAP2_LOCK_UPDATED for that
row and then return MayBeUpdated. Which will make heap_lock_tuple()
successfully lock the row, thereby resetting t_ctid during replay. What
I missed is that case resetting the ctid chain is perfectly fine, since
the pointed to tuple is actually dead. I was just confused by the fact
that we do actually issue a XLOG_HEAP2_LOCK_UPDATED for a dead row.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Greg Stark
On Mon, Mar 17, 2014 at 3:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm thinking we'd better promote that Assert to a normal runtime elog.


I wasn't sure about this but on further thought I think it's a really
good idea and should be mentioned in the release notes. One of the
things that's been bothering me about this bug is that it's really
hard to be know if your standby has suffered from the problem and if
you've promoted it you don't know if your database has a problem.

That said, it would be nice to actually fix the problem, not just
detect it. Eventually vacuum would fix the problem. I think. I'm not
really sure what will happen actually.

-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 21:09:10 +, Greg Stark wrote:
 That said, it would be nice to actually fix the problem, not just
 detect it. Eventually vacuum would fix the problem. I think. I'm not
 really sure what will happen actually.

Indexes will quite possibly stay corrupted. I think. If there was a
index lookup for a affected row, the kill_prior_tuple logic will have
quite possibly have zapped the index entry.

Aside from that, it looks like VACUUM will have a hard time cleaning up
as well. It looks to me like heap_prune_chain() won't remove tuples that
are marked as both HeapTupleHeaderIsHeapOnly() and
HeapTupleHeaderIsHotUpdated(), i.e. intermediate tuples in a HOT
chain. Neither will lazy_scan_heap()...

I think the best way to really cleanup a table is to use something like:
ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
where text is the previous type of the column. That should trigger a
full table rewrite, without any finesse about tracking ctid chains.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Alvaro Herrera
Andres Freund wrote:
 On 2014-03-17 21:09:10 +, Greg Stark wrote:
  That said, it would be nice to actually fix the problem, not just
  detect it. Eventually vacuum would fix the problem. I think. I'm not
  really sure what will happen actually.
 
 Indexes will quite possibly stay corrupted. I think. If there was a
 index lookup for a affected row, the kill_prior_tuple logic will have
 quite possibly have zapped the index entry.
 
 Aside from that, it looks like VACUUM will have a hard time cleaning up
 as well. It looks to me like heap_prune_chain() won't remove tuples that
 are marked as both HeapTupleHeaderIsHeapOnly() and
 HeapTupleHeaderIsHotUpdated(), i.e. intermediate tuples in a HOT
 chain. Neither will lazy_scan_heap()...

Ugh :-(

 I think the best way to really cleanup a table is to use something like:
 ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
 where text is the previous type of the column. That should trigger a
 full table rewrite, without any finesse about tracking ctid chains.

Isn't this what VACUUM FULL does?

-- 
Álvaro Herrerahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 21:09:10 +, Greg Stark wrote:
 That said, it would be nice to actually fix the problem, not just
 detect it. Eventually vacuum would fix the problem. I think. I'm not
 really sure what will happen actually.

 Indexes will quite possibly stay corrupted. I think. If there was a
 index lookup for a affected row, the kill_prior_tuple logic will have
 quite possibly have zapped the index entry.

Whether it did or not, there's no way for the index entry to reach the
now-live tuple version (if it was a HOT update), so the question is moot.
What seems more interesting is whether REINDEX could fix the problem,
but at least with the current logic in catalog/index.c the answer seems
to be no.

It's possible that a REINDEX attempt would work to detect whether you have
a problem (ie, see if you get one of the errors I just added).  I'm not
sure that's bulletproof though.

 I think the best way to really cleanup a table is to use something like:
 ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
 where text is the previous type of the column. That should trigger a
 full table rewrite, without any finesse about tracking ctid chains.

Um... don't we have logic in there that's smart enough to short-circuit
that?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 20:51:31 -0300, Alvaro Herrera wrote:
  I think the best way to really cleanup a table is to use something like:
  ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
  where text is the previous type of the column. That should trigger a
  full table rewrite, without any finesse about tracking ctid chains.
 
 Isn't this what VACUUM FULL does?

No, it uses rewriteheap.c via cluster.c, which tries to preserve
visibility information. There's tracking/mapping of t_ctid... I am not
entirely sure how much it preserves, but I am not really hopeful.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 19:55:01 -0400, Tom Lane wrote:
  I think the best way to really cleanup a table is to use something like:
  ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
  where text is the previous type of the column. That should trigger a
  full table rewrite, without any finesse about tracking ctid chains.
 
 Um... don't we have logic in there that's smart enough to short-circuit
 that?

Not with USING present I think?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Alvaro Herrera alvhe...@2ndquadrant.com writes:
 Andres Freund wrote:
 I think the best way to really cleanup a table is to use something like:
 ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
 where text is the previous type of the column. That should trigger a
 full table rewrite, without any finesse about tracking ctid chains.

 Isn't this what VACUUM FULL does?

AFAIR, both VACUUM FULL and CLUSTER will attempt to preserve update
chains, and thus will probably get confused by this bug (though I've
not looked into exactly what will happen).  I'm not real sure that ALTER
TABLE is any better --- doesn't all that stuff go through rewriteheap.c
now?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Andres Freund
On 2014-03-17 19:58:18 -0400, Tom Lane wrote:
 Alvaro Herrera alvhe...@2ndquadrant.com writes:
  Andres Freund wrote:
  I think the best way to really cleanup a table is to use something like:
  ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);
  where text is the previous type of the column. That should trigger a
  full table rewrite, without any finesse about tracking ctid chains.
 
  Isn't this what VACUUM FULL does?
 
 AFAIR, both VACUUM FULL and CLUSTER will attempt to preserve update
 chains, and thus will probably get confused by this bug (though I've
 not looked into exactly what will happen).  I'm not real sure that ALTER
 TABLE is any better --- doesn't all that stuff go through rewriteheap.c
 now?

Nope, it's using ATRewriteTable(), which just does a loop around
heap_getnext() and does a plain heap_insert(). Normally that's rather
annoying because it completely destroys visibility information, but in
this case...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-17 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2014-03-17 19:55:01 -0400, Tom Lane wrote:
 I think the best way to really cleanup a table is to use something like:
 ALTER TABLE rew ALTER COLUMN data TYPE text USING (data);

 Um... don't we have logic in there that's smart enough to short-circuit
 that?

 Not with USING present I think?

regression=# create table foo as select generate_series(1,100) as data;
SELECT 100
Time: 575.113 ms
regression=# alter table foo alter column data type int4 using (data);
ALTER TABLE
Time: 1001.859 ms
regression=# alter table foo alter column data type int4 using (data);
ALTER TABLE
Time: 0.821 ms
regression=# alter table foo alter column data type int8 using (data);
ALTER TABLE
Time: 677.810 ms
regression=# alter table foo alter column data type int8 using (data);
ALTER TABLE
Time: 1.085 ms

Looks like it's smart enough to me.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-16 Thread Josh Berkus
On 03/15/2014 01:02 PM, Tom Lane wrote:
 First-draft release notes are committed, and should be visible at
 http://www.postgresql.org/docs/devel/static/release-9-3-4.html
 once guaibasaurus does its next buildfarm run a few minutes from
 now.  Any suggestions?

Hmmm, not sure I like this.  It's complicated without being complete,
and supplies just enough information to get someone into trouble:

Also, the error fixed in the second changelog entry below could have
caused some bloat in statistics data. Users who have done many DROP
DATABASE commands since upgrading to 9.3 may wish to manually remove
files in $PGDATA/pg_stat_tmp (or $PGDATA/pg_stat if the server is not
running) that have old modification times and do not correspond to any
database OID present in $PGDATA/base. If you do this, note that the file
db_0.stat is a valid file even though it does not correspond to any
$PGDATA/base subdirectory.

I kind of think that either we should provide complete instructions
(which would be about 3/4 of a page), or provide limited instructions
and assume the only users who will do this are ones who already
understand pg_stat (a reasonable assumption in my opinion), so my
suggestion is move the advice paragraph from E 1.1 to the individual fix
entry in E.1.2, and change it to this:


* Remove the correct per-database statistics file during DROP DATABASE
(Tomas Vondra)

This fix prevents a permanent leak of statistics file space.

Users who have done many DROP DATABASE commands in PostgreSQL 9.3 may
wish to examine their statistics directory for statistics files which do
not correspond to any existing database and delete them.  Please note
that db_0.stat is a needed statistics file.





-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-16 Thread Greg Stark
This is not really accurate:

This error allowed multiple versions of the same row to become
visible to queries, resulting in apparent duplicates. Since the error
is in WAL replay, it would only manifest during crash recovery or on
standby servers.

I think the idea is coming from what the second sentence below is
getting at but it may be too complex to explain in a release note:

The error causes some rows to disappear from indexes resulting in
inconsistent query results on a hot standby depending on whether
indexes are used. If the standby is subsequently activated or if it
occurs during recovery after a crash or backup restore it could result
in unique constraint violations as well.

I would consider adding something like For the problem to occur a
foreign key from another table must exist and a new row must be added
to that other table around the same time (possibly in the same
transaction) as an update to the referenced row That would help
people judge whether their databases are vulnerable. If they don't
have foreign keys or if they have a coding pattern that causes this to
happen regularly then they should be able to figure that out and
possibly disable them if they can't update promptly.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] First-draft release notes for next week's releases

2014-03-16 Thread Josh Berkus
On 03/16/2014 12:32 PM, Greg Stark wrote:
 I would consider adding something like For the problem to occur a
 foreign key from another table must exist and a new row must be added
 to that other table around the same time (possibly in the same
 transaction) as an update to the referenced row That would help
 people judge whether their databases are vulnerable. If they don't
 have foreign keys or if they have a coding pattern that causes this to
 happen regularly then they should be able to figure that out and
 possibly disable them if they can't update promptly.

I don't think that will actually help people know whether they're
vulnerable without a longer explanation.

It's starting to sound like we need a wiki page for this release?

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] First-draft release notes for next week's releases

2014-03-15 Thread Tom Lane
First-draft release notes are committed, and should be visible at
http://www.postgresql.org/docs/devel/static/release-9-3-4.html
once guaibasaurus does its next buildfarm run a few minutes from
now.  Any suggestions?

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers