Re: Bugs in TOAST handling, OID assignment and redo recovery
On 2018-Apr-11, Peter Eisentraut wrote: > On 4/10/18 06:29, Pavan Deolasee wrote: > > One of our 2ndQuadrant support customers recently reported a sudden rush > > of TOAST errors post a crash recovery, nearly causing an outage. Most > > errors read like this: > > > > ERROR: unexpected chunk number 0 (expected 1) for toast value > > While researching this, I found that the terminology in this code is > quite inconsistent. It talks about chunks ids, chunk indexes, chunk > numbers, etc. seemingly interchangeably. The above error is actually > about the chunk_seq, not about the chunk_id, as one might think. > > The attached patch is my attempt to clean this up a bit. Thoughts? I didn't review your patch carefully, but +1 on the idea. By all means let's not forget this for pg12. -- Álvaro Herrerahttps://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On 4/18/18 02:37, Pavan Deolasee wrote: > While I agree that we should clean it up, I wonder if changing error > text would be a good idea. These errors are being reported by a very > long time and if we change the text, we might forget the knowledge about > the past reports. It's presumably fixed now, so if we see the new error message, we'll know it's a different issue. -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Wed, Apr 18, 2018 at 12:07:52PM +0530, Pavan Deolasee wrote: > > > On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut < > peter.eisentr...@2ndquadrant.com> wrote: > > On 4/10/18 06:29, Pavan Deolasee wrote: > > One of our 2ndQuadrant support customers recently reported a sudden rush > > of TOAST errors post a crash recovery, nearly causing an outage. Most > > errors read like this: > > > > ERROR: unexpected chunk number 0 (expected 1) for toast value > > While researching this, I found that the terminology in this code is > quite inconsistent. It talks about chunks ids, chunk indexes, chunk > numbers, etc. seemingly interchangeably. The above error is actually > about the chunk_seq, not about the chunk_id, as one might think. > > The attached patch is my attempt to clean this up a bit. Thoughts? > > > While I agree that we should clean it up, I wonder if changing error text > would > be a good idea. These errors are being reported by a very long time and if we > change the text, we might forget the knowledge about the past reports. > > Also, "toast value" is same as "chunk_id". Should we clean up something there > too? "chunk_seq number" -- should that be just "chunk_seq"? We can put a comment next to the error message C string if we want to keep historical knowledge. -- Bruce Momjian http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Thu, Apr 12, 2018 at 5:53 AM, Peter Eisentraut < peter.eisentr...@2ndquadrant.com> wrote: > On 4/10/18 06:29, Pavan Deolasee wrote: > > One of our 2ndQuadrant support customers recently reported a sudden rush > > of TOAST errors post a crash recovery, nearly causing an outage. Most > > errors read like this: > > > > ERROR: unexpected chunk number 0 (expected 1) for toast value > > While researching this, I found that the terminology in this code is > quite inconsistent. It talks about chunks ids, chunk indexes, chunk > numbers, etc. seemingly interchangeably. The above error is actually > about the chunk_seq, not about the chunk_id, as one might think. > > The attached patch is my attempt to clean this up a bit. Thoughts? > While I agree that we should clean it up, I wonder if changing error text would be a good idea. These errors are being reported by a very long time and if we change the text, we might forget the knowledge about the past reports. Also, "toast value" is same as "chunk_id". Should we clean up something there too? "chunk_seq number" -- should that be just "chunk_seq"? Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
Pavan Deolasee writes: > On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane wrote: >> So while looking at this, it suddenly occurred to me that probing with >> SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment >> either. > Yeah it occurred to me as well, but when I looked at the code, I couldn't > find a case that is broken. I even tried a few test cases with DDLs etc. I think it probably can't happen for catalog MVCC scans, because we generally take full new snapshots for those. The situation that would be hazardous is where we use an existing snapshot (so that it doesn't see the just-committed Other Transaction) but advance its command counter (so that it can see the new object we just made). So the sort of failure I'd predict is that a user query just after an object creation could see duplicate OIDs in the catalogs. To get to that point, you might need a stable function (using the troublesome snapshot) calling a volatile one (which contains the actual DDL). > But I think what you did is fine and more bullet proof. So +1 to that. Yeah, even if this isn't actually a reachable case, it's hard to argue that we should complicate the code and take API-break risks in the back branches to make a tiny optimization on the assumption that it can never happen. regards, tom lane
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Thu, Apr 12, 2018 at 5:21 AM, Tom Lane wrote: > Michael Paquier writes: > > I have not really checked this thread in details, but one thing that > > strikes me is that it would be rather easy to add a TAP test based on > > the initial script that Pavan has sent. Would that be worth testing > > cycles or not? > > I doubt it --- that test is so specialized that it'd be unlikely to > catch any other bug. > > I agree; it's not worth adding a TAP test except may be as a demonstration to write future tests for catching such issues. This was a very specialised test case written after getting a full grasp on the bug. And it just tests the thing that I knew is broken based on code reading. Also, with OID duplicate issue fixed, hitting more bugs in this area is going to be even more difficult. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Thu, Apr 12, 2018 at 1:58 AM, Tom Lane wrote: > So while looking at this, it suddenly occurred to me that probing with > SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment > either. > Yeah it occurred to me as well, but when I looked at the code, I couldn't find a case that is broken. I even tried a few test cases with DDLs etc. But I think what you did is fine and more bullet proof. So +1 to that. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On 4/10/18 06:29, Pavan Deolasee wrote: > One of our 2ndQuadrant support customers recently reported a sudden rush > of TOAST errors post a crash recovery, nearly causing an outage. Most > errors read like this: > > ERROR: unexpected chunk number 0 (expected 1) for toast value While researching this, I found that the terminology in this code is quite inconsistent. It talks about chunks ids, chunk indexes, chunk numbers, etc. seemingly interchangeably. The above error is actually about the chunk_seq, not about the chunk_id, as one might think. The attached patch is my attempt to clean this up a bit. Thoughts? -- Peter Eisentraut http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services From db7f5eacc23114f24230dd087837d94bcefc4c88 Mon Sep 17 00:00:00 2001 From: Peter Eisentraut Date: Wed, 11 Apr 2018 20:11:35 -0400 Subject: [PATCH] Clarify TOAST terminology in variables and error messages --- src/backend/access/heap/tuptoaster.c | 67 ++-- 1 file changed, 34 insertions(+), 33 deletions(-) diff --git a/src/backend/access/heap/tuptoaster.c b/src/backend/access/heap/tuptoaster.c index cd42c50b09..4113cb29d2 100644 --- a/src/backend/access/heap/tuptoaster.c +++ b/src/backend/access/heap/tuptoaster.c @@ -1882,10 +1882,10 @@ toast_fetch_datum(struct varlena *attr) struct varlena *result; struct varatt_external toast_pointer; int32 ressize; - int32 residx, - nextidx; + int32 res_seq, + next_seq; int32 numchunks; - Pointer chunk; + Pointer chunk_data; boolisnull; char *chunkdata; int32 chunksize; @@ -1930,13 +1930,13 @@ toast_fetch_datum(struct varlena *attr) ObjectIdGetDatum(toast_pointer.va_valueid)); /* -* Read the chunks by index +* Read the chunks by sequence number * -* Note that because the index is actually on (valueid, chunkidx) we will -* see the chunks in chunkidx order, even though we didn't explicitly ask +* Note that because the index is actually on (chunk_id, chunk_seq) we will +* see the chunks in chunk_seq order, even though we didn't explicitly ask * for it. */ - nextidx = 0; + next_seq = 0; init_toast_snapshot(&SnapshotToast); toastscan = systable_beginscan_ordered(toastrel, toastidxs[validIndex], @@ -1946,20 +1946,20 @@ toast_fetch_datum(struct varlena *attr) /* * Have a chunk, extract the sequence number and the data */ - residx = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull)); + res_seq = DatumGetInt32(fastgetattr(ttup, 2, toasttupDesc, &isnull)); Assert(!isnull); - chunk = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull)); + chunk_data = DatumGetPointer(fastgetattr(ttup, 3, toasttupDesc, &isnull)); Assert(!isnull); - if (!VARATT_IS_EXTENDED(chunk)) + if (!VARATT_IS_EXTENDED(chunk_data)) { - chunksize = VARSIZE(chunk) - VARHDRSZ; - chunkdata = VARDATA(chunk); + chunksize = VARSIZE(chunk_data) - VARHDRSZ; + chunkdata = VARDATA(chunk_data); } - else if (VARATT_IS_SHORT(chunk)) + else if (VARATT_IS_SHORT(chunk_data)) { /* could happen due to heap_form_tuple doing its thing */ - chunksize = VARSIZE_SHORT(chunk) - VARHDRSZ_SHORT; - chunkdata = VARDATA_SHORT(chunk); + chunksize = VARSIZE_SHORT(chunk_data) - VARHDRSZ_SHORT; + chunkdata = VARDATA_SHORT(chunk_data); } else { @@ -1974,33 +1974,34 @@ toast_fetch_datum(struct varlena *attr) /* * Some checks on the data we've found */ - if (residx != nextidx) - elog(ERROR, "unexpected chunk number %d (expected %d) for toast value %u in %s", -residx, nextidx, + if (res_seq != next_seq) + elog(ERROR, "unexpected chunk_seq %d (expected %d) for toast value %u in %s", +res_seq, next_seq, toast_pointer.va_valueid, RelationGetRelationName(toastrel)); - if (residx < numchunks - 1) + + if (res_seq < numchunks - 1) { if (chunksize != TOAST_MAX_CHUNK_SIZE) -
Re: Bugs in TOAST handling, OID assignment and redo recovery
Michael Paquier writes: > I have not really checked this thread in details, but one thing that > strikes me is that it would be rather easy to add a TAP test based on > the initial script that Pavan has sent. Would that be worth testing > cycles or not? I doubt it --- that test is so specialized that it'd be unlikely to catch any other bug. An example of the sort of bug that could exist in the same area would be if we'd failed to note the places in tuptoaster.c that also need to be switched to SnapshotAny. Only a test covering heap-rewrite could have found that, which this test case surely doesn't. What I think would actually be worth spending some time on is taking a hard look at the remaining uses of SnapshotDirty to see if any of them look squishy. regards, tom lane
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Wed, Apr 11, 2018 at 04:28:33PM -0400, Tom Lane wrote: > I propose therefore that the right fix does not require an API change > for GetNewOidWithIndex: we can just make it use SnapshotAny all the > time. I have not really checked this thread in details, but one thing that strikes me is that it would be rather easy to add a TAP test based on the initial script that Pavan has sent. Would that be worth testing cycles or not? We do have tests like 008_fsm_truncation.pl which test similar failure scenarios, so I think that we should have one. -- Michael signature.asc Description: PGP signature
Re: Bugs in TOAST handling, OID assignment and redo recovery
So while looking at this, it suddenly occurred to me that probing with SnapshotDirty isn't that safe for regular (non-TOAST) Oid assignment either. SnapshotDirty will consider a row dead the instant the deleting transaction has committed, but it may remain visible to other transactions for awhile after that --- and now that we use MVCC for catalog scans, that applies to them too. Hence, the existing logic is capable of creating objects with transiently-conflicting OIDs. I don't think this could create a conflict that's visible outside our own transaction, since anyone who can see our commit would also see the commit of the deleting transaction. But there's definitely a hazard inside the transaction that creates a new object. I propose therefore that the right fix does not require an API change for GetNewOidWithIndex: we can just make it use SnapshotAny all the time. regards, tom lane
Re: Bugs in TOAST handling, OID assignment and redo recovery
On 11 April 2018 at 19:57, Tom Lane wrote: > Pavan Deolasee writes: >> Ok. I propose attached patches, more polished this time. > > I'll take these, unless some other committer is hot to do so? Please go ahead. -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
Pavan Deolasee writes: > Ok. I propose attached patches, more polished this time. I'll take these, unless some other committer is hot to do so? regards, tom lane
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Wed, Apr 11, 2018 at 8:20 PM, Tom Lane wrote: > Pavan Deolasee writes: > > Or may be we simply err on the side of caution and scan the toast table > > with SnapshotAny while looking for a duplicate? That might prevent us > from > > reusing an OID for a known-dead tuple, but should save us a second index > > scan and still work. > > +1. We really don't want to expend two indexscans on this. > > Ok. I propose attached patches, more polished this time. I also changed toastrel_valueid_exists() to use SnapshotAny, but I don't have any proofs to claim that's a needed change. But it seems harmless and given the number of toast related, hard to chase bugs we have seen over the years, may it's worth making it full proof too. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services 0002-Do-more-extensive-search-while-looking-for-duplicate.patch Description: Binary data 0001-Do-not-overwrite-the-nextOid-counter-while-replaying.patch Description: Binary data
Re: Bugs in TOAST handling, OID assignment and redo recovery
Pavan Deolasee writes: > Or may be we simply err on the side of caution and scan the toast table > with SnapshotAny while looking for a duplicate? That might prevent us from > reusing an OID for a known-dead tuple, but should save us a second index > scan and still work. +1. We really don't want to expend two indexscans on this. I was worried about changing the signature of GetNewOidWithIndex in a back-patched fix, but after looking around I think that's probably safe. External callers really shouldn't be using anything lower-level than GetNewOid. regards, tom lane
Re: Bugs in TOAST handling, OID assignment and redo recovery
On Tue, Apr 10, 2018 at 7:24 PM, Pavan Deolasee wrote: > Hi Heikki, > > On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas > wrote: > >> >>> >> It would seem more straightforward to add a snapshot parameter to >> GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast, >> while others pass SnapshotDirty. With your patch, you check the index >> twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with >> SnapshotToast, in the caller. >> > > Hmm. I actually wrote my first patch exactly like that. I am trying to > remember why I discarded that approach. Was that to do with the fact that > SnapshotToast > can't see all toast tuples either? Yeah, I think so. For example, it won't > see tuples with uncommitted-xmin, leading to different issues. Now it's > unlikely that we will have a OID conflict where the old tuple has > uncommitted-xmin, but not sure if we can completely rule that out. > Or may be we simply err on the side of caution and scan the toast table with SnapshotAny while looking for a duplicate? That might prevent us from reusing an OID for a known-dead tuple, but should save us a second index scan and still work. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
Hi Heikki, On Tue, Apr 10, 2018 at 7:07 PM, Heikki Linnakangas wrote: > >> > It would seem more straightforward to add a snapshot parameter to > GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast, > while others pass SnapshotDirty. With your patch, you check the index > twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with > SnapshotToast, in the caller. > Hmm. I actually wrote my first patch exactly like that. I am trying to remember why I discarded that approach. Was that to do with the fact that SnapshotToast can't see all toast tuples either? Yeah, I think so. For example, it won't see tuples with uncommitted-xmin, leading to different issues. Now it's unlikely that we will have a OID conflict where the old tuple has uncommitted-xmin, but not sure if we can completely rule that out. For example, if we did not uncover the redo recovery bug, we could have had a prepared transaction having inserted the old tuple, which now conflicts with new tuple and not detected by SnapshotToast. > > If I'm reading the rewrite case correctly, it's a bit different and quite > special. In the loop with GetNewOidWithIndex(), it needs to check that the > OID is unused in two tables, the old TOAST table, and the new one. You can > only pass one index to GetNewOidWithIndex(), so it needs to check the > second index manually. It's not because of the snapshot issue. Although I > wonder if we should be using SnapshotToast in that GetNewOidWithIndex() > call, too. I.e. if we should be checking both the old and the new toast > table with SnapshotToast. As I said, I am not sure if checking with just SnapshotToast is enough because it can't see "dirty" tuples. > > Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the value > in the online checkpoint record is greater than > ShmemVariableCache->nextXid. But we don't have such a wraparound-aware > concept of "greater than" for OIDs. Ignoring the online checkpoint record's > nextOid value seem fine to me. > Ok. Thanks for checking. Thanks, Pavan -- Pavan Deolasee http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: Bugs in TOAST handling, OID assignment and redo recovery
On 10/04/18 13:29, Pavan Deolasee wrote: Hello, One of our 2ndQuadrant support customers recently reported a sudden rush of TOAST errors post a crash recovery, nearly causing an outage. Most errors read like this: ERROR: unexpected chunk number 0 (expected 1) for toast value While we could bring back the cluster to normal quickly using some workarounds, I investigated this in more detail and identified two long standing bugs in TOAST as well as redo recovery. Great detective work! ISTM that the right fix is to teach toast_save_datum() to check for existence of duplicate chunk_id by scanning the table with the same SnapshotToast that it later uses to fetch the tuples. We already do that in case of toast rewrite, but not for regular inserts. I propose to do that for regular path too, as done in the attached patch. It would seem more straightforward to add a snapshot parameter to GetNewOidWithIndex(), so that the this one caller could pass SnapshotToast, while others pass SnapshotDirty. With your patch, you check the index twice: first with SnapshotDirty, in GetNewOidWithIndex(), and then with SnapshotToast, in the caller. If I'm reading the rewrite case correctly, it's a bit different and quite special. In the loop with GetNewOidWithIndex(), it needs to check that the OID is unused in two tables, the old TOAST table, and the new one. You can only pass one index to GetNewOidWithIndex(), so it needs to check the second index manually. It's not because of the snapshot issue. Although I wonder if we should be using SnapshotToast in that GetNewOidWithIndex() call, too. I.e. if we should be checking both the old and the new toast table with SnapshotToast. I think the right fix is to simply ignore the nextOid counter while replaying ONLINE checkpoint record. We must have already initialised ShmemVariableCache->nextOid to the value stored in this (or some previous) checkpoint record when redo recovery is started. As and when we see XLOG_NEXTOID record, we would maintain the shared memory counter correctly. If we don't see any XLOG_NEXTOID record, the value we started with must be the correct value. I see no problem even when OID wraps around during redo recovery. XLOG_NEXTOID should record that correctly. Agreed. With nextXid, we advance ShmemVariableCache->nextXid if the value in the online checkpoint record is greater than ShmemVariableCache->nextXid. But we don't have such a wraparound-aware concept of "greater than" for OIDs. Ignoring the online checkpoint record's nextOid value seem fine to me. - Heikki
Re: Bugs in TOAST handling, OID assignment and redo recovery
Am Dienstag, den 10.04.2018, 15:59 +0530 schrieb Pavan Deolasee: > One of our 2ndQuadrant support customers recently reported a sudden > rush of > TOAST errors post a crash recovery, nearly causing an outage. Most > errors > read like this: > > ERROR: unexpected chunk number 0 (expected 1) for toast value > > While we could bring back the cluster to normal quickly using some > workarounds, I investigated this in more detail and identified two > long > standing bugs in TOAST as well as redo recovery. Wow! I'm currently also investigating issues on a customer system, too, where suddenly TOAST errors arised after shutdown immediate. I haven't dug into your findings yet, but it seems to perfectly fit into the reports i got. Thanks, Bernd