Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-07-23 Thread Alvaro Herrera
On 2018-Apr-11, Peter Eisentraut wrote:

> On 4/10/18 06:29, Pavan Deolasee wrote:
> > One of our 2ndQuadrant support customers recently reported a sudden rush
> > of TOAST errors post a crash recovery, nearly causing an outage. Most
> > errors read like this:
> > 
> > ERROR: unexpected chunk number 0 (expected 1) for toast value 
> 
> While researching this, I found that the terminology in this code is
> quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
> numbers, etc. seemingly interchangeably.  The above error is actually
> about the chunk_seq, not about the chunk_id, as one might think.
> 
> The attached patch is my attempt to clean this up a bit.  Thoughts?

I didn't review your patch carefully, but +1 on the idea.  By all means
let's not forget this for pg12.

-- 
Álvaro Herrerahttps://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-24 Thread Peter Eisentraut
On 4/18/18 02:37, Pavan Deolasee wrote:
> While I agree that we should clean it up, I wonder if changing error
> text would be a good idea. These errors are being reported by a very
> long time and if we change the text, we might forget the knowledge about
> the past reports.

It's presumably fixed now, so if we see the new error message, we'll
know it's a different issue.

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-24 Thread Bruce Momjian
On Wed, Apr 18, 2018 at 12:07:52PM +0530, Pavan Deolasee wrote:
> 
> 
> On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut <
> peter.eisentr...@2ndquadrant.com> wrote:
> 
> On 4/10/18 06:29, Pavan Deolasee wrote:
> > One of our 2ndQuadrant support customers recently reported a sudden rush
> > of TOAST errors post a crash recovery, nearly causing an outage. Most
> > errors read like this:
> >
> > ERROR: unexpected chunk number 0 (expected 1) for toast value 
> 
> While researching this, I found that the terminology in this code is
> quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
> numbers, etc. seemingly interchangeably.  The above error is actually
> about the chunk_seq, not about the chunk_id, as one might think.
> 
> The attached patch is my attempt to clean this up a bit.  Thoughts?
> 
> 
> While I agree that we should clean it up, I wonder if changing error text 
> would
> be a good idea. These errors are being reported by a very long time and if we
> change the text, we might forget the knowledge about the past reports.
> 
> Also, "toast value" is same as "chunk_id". Should we clean up something there
> too? "chunk_seq number" -- should that be just "chunk_seq"?

We can put a comment next to the error message C string if we want to
keep historical knowledge.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+  Ancient Roman grave inscription +



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-17 Thread Pavan Deolasee
On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut <
peter.eisentr...@2ndquadrant.com> wrote:

> On 4/10/18 06:29, Pavan Deolasee wrote:
> > One of our 2ndQuadrant support customers recently reported a sudden rush
> > of TOAST errors post a crash recovery, nearly causing an outage. Most
> > errors read like this:
> >
> > ERROR: unexpected chunk number 0 (expected 1) for toast value 
>
> While researching this, I found that the terminology in this code is
> quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
> numbers, etc. seemingly interchangeably.  The above error is actually
> about the chunk_seq, not about the chunk_id, as one might think.
>
> The attached patch is my attempt to clean this up a bit.  Thoughts?
>

While I agree that we should clean it up, I wonder if changing error text
would be a good idea. These errors are being reported by a very long time
and if we change the text, we might forget the knowledge about the past
reports.

Also, "toast value" is same as "chunk_id". Should we clean up something
there too? "chunk_seq number" -- should that be just "chunk_seq"?


Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-12 Thread Tom Lane
Pavan Deolasee  writes:
> On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane  wrote:
>> So while looking at this, it suddenly occurred to me that probing with
>> SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
>> either.

> Yeah it occurred to me as well, but when I looked at the code, I couldn't
> find a case that is broken. I even tried a few test cases with DDLs etc.

I think it probably can't happen for catalog MVCC scans, because we
generally take full new snapshots for those.  The situation that would be
hazardous is where we use an existing snapshot (so that it doesn't see
the just-committed Other Transaction) but advance its command counter
(so that it can see the new object we just made).  So the sort of failure
I'd predict is that a user query just after an object creation could see
duplicate OIDs in the catalogs.  To get to that point, you might need a
stable function (using the troublesome snapshot) calling a volatile one
(which contains the actual DDL).

> But I think what you did is fine and more bullet proof. So +1 to that.

Yeah, even if this isn't actually a reachable case, it's hard to argue
that we should complicate the code and take API-break risks in the back
branches to make a tiny optimization on the assumption that it can never
happen.

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-12 Thread Pavan Deolasee
On Thu, Apr 12, 2018 at 5:21 AM, Tom Lane  wrote:

> Michael Paquier  writes:
> > I have not really checked this thread in details, but one thing that
> > strikes me is that it would be rather easy to add a TAP test based on
> > the initial script that Pavan has sent.  Would that be worth testing
> > cycles or not?
>
> I doubt it --- that test is so specialized that it'd be unlikely to
> catch any other bug.
>
>
I agree; it's not worth adding a TAP test except may be as a demonstration
to write future tests for catching such issues. This was a very specialised
test case written after getting a full grasp on the bug. And it just tests
the thing that I knew is broken based on code reading. Also, with OID
duplicate issue fixed, hitting more bugs in this area is going to be even
more difficult.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-12 Thread Pavan Deolasee
On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane  wrote:

> So while looking at this, it suddenly occurred to me that probing with
> SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
> either.
>

Yeah it occurred to me as well, but when I looked at the code, I couldn't
find a case that is broken. I even tried a few test cases with DDLs etc.
But I think what you did is fine and more bullet proof. So +1 to that.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Peter Eisentraut
On 4/10/18 06:29, Pavan Deolasee wrote:
> One of our 2ndQuadrant support customers recently reported a sudden rush
> of TOAST errors post a crash recovery, nearly causing an outage. Most
> errors read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value 

While researching this, I found that the terminology in this code is
quite inconsistent.  It talks about chunks ids, chunk indexes, chunk
numbers, etc. seemingly interchangeably.  The above error is actually
about the chunk_seq, not about the chunk_id, as one might think.

The attached patch is my attempt to clean this up a bit.  Thoughts?

-- 
Peter Eisentraut  http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
From db7f5eacc23114f24230dd087837d94bcefc4c88 Mon Sep 17 00:00:00 2001
From: Peter Eisentraut 
Date: Wed, 11 Apr 2018 20:11:35 -0400
Subject: [PATCH] Clarify TOAST terminology in variables and error messages

---
 src/backend/access/heap/tuptoaster.c | 67 ++--
 1 file changed, 34 insertions(+), 33 deletions(-)

diff --git a/src/backend/access/heap/tuptoaster.c 
b/src/backend/access/heap/tuptoaster.c
index cd42c50b09..4113cb29d2 100644
--- a/src/backend/access/heap/tuptoaster.c
+++ b/src/backend/access/heap/tuptoaster.c
@@ -1882,10 +1882,10 @@ toast_fetch_datum(struct varlena *attr)
struct varlena *result;
struct varatt_external toast_pointer;
int32   ressize;
-   int32   residx,
-   nextidx;
+   int32   res_seq,
+   next_seq;
int32   numchunks;
-   Pointer chunk;
+   Pointer chunk_data;
boolisnull;
char   *chunkdata;
int32   chunksize;
@@ -1930,13 +1930,13 @@ toast_fetch_datum(struct varlena *attr)
ObjectIdGetDatum(toast_pointer.va_valueid));
 
/*
-* Read the chunks by index
+* Read the chunks by sequence number
 *
-* Note that because the index is actually on (valueid, chunkidx) we 
will
-* see the chunks in chunkidx order, even though we didn't explicitly 
ask
+* Note that because the index is actually on (chunk_id, chunk_seq) we 
will
+* see the chunks in chunk_seq order, even though we didn't explicitly 
ask
 * for it.
 */
-   nextidx = 0;
+   next_seq = 0;
 
init_toast_snapshot(&SnapshotToast);
toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex],
@@ -1946,20 +1946,20 @@ toast_fetch_datum(struct varlena *attr)
/*
 * Have a chunk, extract the sequence number and the data
 */
-   residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, 
&isnull));
+   res_seq = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, 
&isnull));
Assert(!isnull);
-   chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, 
&isnull));
+   chunk_data = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, 
&isnull));
Assert(!isnull);
-   if (!VARATT_IS_EXTENDED(chunk))
+   if (!VARATT_IS_EXTENDED(chunk_data))
{
-   chunksize = VARSIZE(chunk) - VARHDRSZ;
-   chunkdata = VARDATA(chunk);
+   chunksize = VARSIZE(chunk_data) - VARHDRSZ;
+   chunkdata = VARDATA(chunk_data);
}
-   else if (VARATT_IS_SHORT(chunk))
+   else if (VARATT_IS_SHORT(chunk_data))
{
/* could happen due to heap_form_tuple doing its thing 
*/
-   chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT;
-   chunkdata = VARDATA_SHORT(chunk);
+   chunksize = VARSIZE_SHORT(chunk_data) - VARHDRSZ_SHORT;
+   chunkdata = VARDATA_SHORT(chunk_data);
}
else
{
@@ -1974,33 +1974,34 @@ toast_fetch_datum(struct varlena *attr)
/*
 * Some checks on the data we've found
 */
-   if (residx != nextidx)
-   elog(ERROR, "unexpected chunk number %d (expected %d) 
for toast value %u in %s",
-residx, nextidx,
+   if (res_seq != next_seq)
+   elog(ERROR, "unexpected chunk_seq %d (expected %d) for 
toast value %u in %s",
+res_seq, next_seq,
 toast_pointer.va_valueid,
 RelationGetRelationName(toastrel));
-   if (residx < numchunks - 1)
+
+   if (res_seq < numchunks - 1)
{
if (chunksize != TOAST_MAX_CHUNK_SIZE)
- 

Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Tom Lane
Michael Paquier  writes:
> I have not really checked this thread in details, but one thing that
> strikes me is that it would be rather easy to add a TAP test based on
> the initial script that Pavan has sent.  Would that be worth testing
> cycles or not?

I doubt it --- that test is so specialized that it'd be unlikely to
catch any other bug.

An example of the sort of bug that could exist in the same area would
be if we'd failed to note the places in tuptoaster.c that also need to
be switched to SnapshotAny.  Only a test covering heap-rewrite could
have found that, which this test case surely doesn't.

What I think would actually be worth spending some time on is taking
a hard look at the remaining uses of SnapshotDirty to see if any of
them look squishy.

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Michael Paquier
On Wed, Apr 11, 2018 at 04:28:33PM -0400, Tom Lane wrote:
> I propose therefore that the right fix does not require an API change
> for GetNewOidWithIndex: we can just make it use SnapshotAny all the
> time.

I have not really checked this thread in details, but one thing that
strikes me is that it would be rather easy to add a TAP test based on
the initial script that Pavan has sent.  Would that be worth testing
cycles or not?  We do have tests like 008_fsm_truncation.pl which test
similar failure scenarios, so I think that we should have one.
--
Michael


signature.asc
Description: PGP signature


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Tom Lane
So while looking at this, it suddenly occurred to me that probing with
SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment
either.  SnapshotDirty will consider a row dead the instant the
deleting transaction has committed, but it may remain visible to other
transactions for awhile after that --- and now that we use MVCC for
catalog scans, that applies to them too.  Hence, the existing logic
is capable of creating objects with transiently-conflicting OIDs.
I don't think this could create a conflict that's visible outside
our own transaction, since anyone who can see our commit would also
see the commit of the deleting transaction.  But there's definitely
a hazard inside the transaction that creates a new object.

I propose therefore that the right fix does not require an API change
for GetNewOidWithIndex: we can just make it use SnapshotAny all the
time.

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Simon Riggs
On 11 April 2018 at 19:57, Tom Lane  wrote:
> Pavan Deolasee  writes:
>> Ok. I propose attached patches, more polished this time.
>
> I'll take these, unless some other committer is hot to do so?

Please go ahead.

-- 
Simon Riggshttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Tom Lane
Pavan Deolasee  writes:
> Ok. I propose attached patches, more polished this time.

I'll take these, unless some other committer is hot to do so?

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Pavan Deolasee
On Wed, Apr 11, 2018 at 8:20 PM, Tom Lane  wrote:

> Pavan Deolasee  writes:
> > Or may be we simply err on the side of caution and scan the toast table
> > with SnapshotAny while looking for a duplicate? That might prevent us
> from
> > reusing an OID for a known-dead tuple, but should save us a second index
> > scan and still work.
>
> +1.  We really don't want to expend two indexscans on this.
>
>
Ok. I propose attached patches, more polished this time. I also
changed toastrel_valueid_exists() to use SnapshotAny, but I don't have any
proofs to claim that's a needed change. But it seems harmless and given the
number of toast related, hard to chase bugs we have seen over the years,
may it's worth making it full proof too.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


0002-Do-more-extensive-search-while-looking-for-duplicate.patch
Description: Binary data


0001-Do-not-overwrite-the-nextOid-counter-while-replaying.patch
Description: Binary data


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Tom Lane
Pavan Deolasee  writes:
> Or may be we simply err on the side of caution and scan the toast table
> with SnapshotAny while looking for a duplicate? That might prevent us from
> reusing an OID for a known-dead tuple, but should save us a second index
> scan and still work.

+1.  We really don't want to expend two indexscans on this.

I was worried about changing the signature of GetNewOidWithIndex in
a back-patched fix, but after looking around I think that's probably
safe.  External callers really shouldn't be using anything lower-level
than GetNewOid.

regards, tom lane



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-11 Thread Pavan Deolasee
On Tue, Apr 10, 2018 at 7:24 PM, Pavan Deolasee 
wrote:

> Hi Heikki,
>
> On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas 
> wrote:
>
>>
>>>
>> It would seem more straightforward to add a snapshot parameter to
>> GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast,
>> while others pass SnapshotDirty. With your patch, you check the index
>> twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with
>> SnapshotToast, in the caller.
>>
>
> Hmm. I actually wrote my first patch exactly like that. I am trying to
> remember why I discarded that approach. Was that to do with the fact that  
> SnapshotToast
> can't see all toast tuples either? Yeah, I think so. For example, it won't
> see tuples with uncommitted-xmin, leading to different issues. Now it's
> unlikely that we will have a OID conflict where the old tuple has
> uncommitted-xmin, but not sure if we can completely rule that out.
>

Or may be we simply err on the side of caution and scan the toast table
with SnapshotAny while looking for a duplicate? That might prevent us from
reusing an OID for a known-dead tuple, but should save us a second index
scan and still work.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Pavan Deolasee
Hi Heikki,

On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas  wrote:

>
>>
> It would seem more straightforward to add a snapshot parameter to
> GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast,
> while others pass SnapshotDirty. With your patch, you check the index
> twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with
> SnapshotToast, in the caller.
>

Hmm. I actually wrote my first patch exactly like that. I am trying to
remember why I discarded that approach. Was that to do with the fact
that  SnapshotToast
can't see all toast tuples either? Yeah, I think so. For example, it won't
see tuples with uncommitted-xmin, leading to different issues. Now it's
unlikely that we will have a OID conflict where the old tuple has
uncommitted-xmin, but not sure if we can completely rule that out. For
example, if we did not uncover the redo recovery bug, we could have had a
prepared transaction having inserted the old tuple, which now conflicts
with new tuple and not detected by SnapshotToast.



>
> If I'm reading the rewrite case correctly, it's a bit different and quite
> special. In the loop with GetNewOidWithIndex(), it needs to check that the
> OID is unused in two tables, the old TOAST table, and the new one. You can
> only pass one index to GetNewOidWithIndex(), so it needs to check the
> second index manually. It's not because of the snapshot issue. Although I
> wonder if we should be using SnapshotToast in that GetNewOidWithIndex()
> call, too. I.e. if we should be checking both the old and the new toast
> table with SnapshotToast.


As I said, I am not sure if checking with just SnapshotToast is enough
because it can't see "dirty" tuples.


>
> Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the value
> in the online checkpoint record is greater than
> ShmemVariableCache->nextXid. But we don't have such a wraparound-aware
> concept of "greater than" for OIDs. Ignoring the online checkpoint record's
> nextOid value seem fine to me.
>

Ok. Thanks for checking.

Thanks,
Pavan

-- 
 Pavan Deolasee   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Heikki Linnakangas

On 10/04/18 13:29, Pavan Deolasee wrote:

Hello,

One of our 2ndQuadrant support customers recently reported a sudden rush of
TOAST errors post a crash recovery, nearly causing an outage. Most errors
read like this:

ERROR: unexpected chunk number 0 (expected 1) for toast value 

While we could bring back the cluster to normal quickly using some
workarounds, I investigated this in more detail and identified two long
standing bugs in TOAST as well as redo recovery.


Great detective work!


ISTM that the right fix is to teach toast_save_datum() to check for
existence of duplicate chunk_id by scanning the table with the same
SnapshotToast that it later uses to fetch the tuples. We already do that in
case of toast rewrite, but not for regular inserts. I propose to do that
for regular path too, as done in the attached patch.


It would seem more straightforward to add a snapshot parameter to 
GetNewOidWithIndex(), so that the this one caller could pass 
SnapshotToast, while others pass SnapshotDirty. With your patch, you 
check the index twice: first with SnapshotDirty, in 
GetNewOidWithIndex(), and then with SnapshotToast, in the caller.


If I'm reading the rewrite case correctly, it's a bit different and 
quite special. In the loop with GetNewOidWithIndex(), it needs to check 
that the OID is unused in two tables, the old TOAST table, and the new 
one. You can only pass one index to GetNewOidWithIndex(), so it needs to 
check the second index manually. It's not because of the snapshot issue. 
Although I wonder if we should be using SnapshotToast in that 
GetNewOidWithIndex() call, too. I.e. if we should be checking both the 
old and the new toast table with SnapshotToast.



I think the right fix is to simply ignore the nextOid counter while
replaying ONLINE checkpoint record. We must have already initialised
ShmemVariableCache->nextOid
to the value stored in this (or some previous) checkpoint record when redo
recovery is started. As and when we see XLOG_NEXTOID record, we would
maintain the shared memory counter correctly. If we don't see any
XLOG_NEXTOID record, the value we started with must be the correct value. I
see no problem even when OID wraps around during redo recovery.
XLOG_NEXTOID should record that correctly.


Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the 
value in the online checkpoint record is greater than 
ShmemVariableCache->nextXid. But we don't have such a wraparound-aware 
concept of "greater than" for OIDs. Ignoring the online checkpoint 
record's nextOid value seem fine to me.


- Heikki



Re: Bugs in TOAST handling, OID assignment and redo recovery

2018-04-10 Thread Bernd Helmle
Am Dienstag, den 10.04.2018, 15:59 +0530 schrieb Pavan Deolasee:
> One of our 2ndQuadrant support customers recently reported a sudden
> rush of
> TOAST errors post a crash recovery, nearly causing an outage. Most
> errors
> read like this:
> 
> ERROR: unexpected chunk number 0 (expected 1) for toast value 
> 
> While we could bring back the cluster to normal quickly using some
> workarounds, I investigated this in more detail and identified two
> long
> standing bugs in TOAST as well as redo recovery.

Wow! I'm currently also investigating issues on a customer system, too,
where suddenly TOAST errors arised after shutdown immediate.

I haven't dug into your findings yet, but it seems to perfectly fit
into the reports i got.

Thanks,
Bernd