Re: pg_amcheck contrib application
On Fri, Apr 30, 2021 at 5:07 PM Robert Haas wrote: > On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger > wrote: > > After further reflection, no other verbiage seems any better. I'd say go > > ahead and commit it this way. > > OK. I'll plan to do that on Monday, barring objections. Done now. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger wrote: > After further reflection, no other verbiage seems any better. I'd say go > ahead and commit it this way. OK. I'll plan to do that on Monday, barring objections. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 1:04 PM, Mark Dilger wrote: > >> toast value %u was expected to end at chunk %d, but ended while >> expecting chunk %d >> >> i.e. same as the currently-committed code, except for changing "ended >> at" to "ended while expecting." > > I find the grammar of this new formulation anomalous for hard to articulate > reasons not quite the same as but akin to mismatched verb aspect. After further reflection, no other verbiage seems any better. I'd say go ahead and commit it this way. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 12:47 PM, Robert Haas wrote: > > Hmm, I think that might need adjustment, actually. What I was trying > to do is compensate for the fact that what we now have is the next > chunk_seq value we expect, not the last one we saw, nor the total > number of chunks we've seen regardless of what chunk_seq they had. But > I thought it would be too confusing to just give the chunk number we > were expecting and not say anything about how many chunks we thought > there would be in total. So maybe what I should do is change it to > something like this: > > toast value %u was expected to end at chunk %d, but ended while > expecting chunk %d > > i.e. same as the currently-committed code, except for changing "ended > at" to "ended while expecting." I find the grammar of this new formulation anomalous for hard to articulate reasons not quite the same as but akin to mismatched verb aspect. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 30, 2021 at 3:41 PM Mark Dilger wrote: > I think that's committable. > > The only nitpick might be > > - psprintf("toast value %u was expected to end > at chunk %d, but ended at chunk %d", > + psprintf("toast value %u index scan ended > early while expecting chunk %d of %d", > > When reporting to users about positions within a zero-based indexing scheme, > what does "while expecting chunk 3 of 4" mean? Is it talking about the last > chunk from the set [0..3] which has cardinality 4, or does it mean the > next-to-last chunk from [0..4] which ends with chunk 4, or what? The prior > language isn't any more clear than what you have here, so I have no objection > to committing this, but the prior language was probably as goofy as it was > because it was trying to deal with this issue. Hmm, I think that might need adjustment, actually. What I was trying to do is compensate for the fact that what we now have is the next chunk_seq value we expect, not the last one we saw, nor the total number of chunks we've seen regardless of what chunk_seq they had. But I thought it would be too confusing to just give the chunk number we were expecting and not say anything about how many chunks we thought there would be in total. So maybe what I should do is change it to something like this: toast value %u was expected to end at chunk %d, but ended while expecting chunk %d i.e. same as the currently-committed code, except for changing "ended at" to "ended while expecting." -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 12:29 PM, Robert Haas wrote: > > OK, how about this version? I think that's committable. The only nitpick might be - psprintf("toast value %u was expected to end at chunk %d, but ended at chunk %d", + psprintf("toast value %u index scan ended early while expecting chunk %d of %d", When reporting to users about positions within a zero-based indexing scheme, what does "while expecting chunk 3 of 4" mean? Is it talking about the last chunk from the set [0..3] which has cardinality 4, or does it mean the next-to-last chunk from [0..4] which ends with chunk 4, or what? The prior language isn't any more clear than what you have here, so I have no objection to committing this, but the prior language was probably as goofy as it was because it was trying to deal with this issue. Thoughts? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 30, 2021 at 3:26 PM Mark Dilger wrote: > It looks mostly good to me. There is a off-by-one error introduced with: > > - else if (chunkno != (endchunk + 1)) > + else if (expected_chunk_seq < last_chunk_seq) > > I think that needs to be > > + else if (expected_chunk_seq <= last_chunk_seq) > > because otherwise it won't complain if the only missing chunk is the very > last one. OK, how about this version? -- Robert Haas EDB: http://www.enterprisedb.com simply-remove-chunkno-concept-v4.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 11:56 AM, Robert Haas wrote: > > Can you review this version? It looks mostly good to me. There is a off-by-one error introduced with: - else if (chunkno != (endchunk + 1)) + else if (expected_chunk_seq < last_chunk_seq) I think that needs to be + else if (expected_chunk_seq <= last_chunk_seq) because otherwise it won't complain if the only missing chunk is the very last one. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 11:56 AM, Robert Haas wrote: > > On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger > wrote: >> Just to be clear, I did not use your patch v1 as the starting point. > > I thought that might be the case, but I was trying to understand what > you didn't like about my version, and comparing them seemed like a way > to figure that out. > >> I took the code as committed to master as the starting point, used your >> corruption report verbiage changes and at least some of your variable naming >> choices, but did not use the rest, in large part because it didn't work. It >> caused corruption messages to be reported against tables that have no >> corruption. For that matter, your v2 patch doesn't work either, and in the >> same way. To wit: >> >> heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, >> attribute 7: >> toast value 13461 chunk 0 has size 1995, but expected size 1996 >> >> I think there is something wrong with the way you are trying to calculate >> and use extsize, because I'm not corrupting pg_catalog.pg_rewrite. You can >> get these same results by applying your patch to master, building, and >> running 'make check' from src/bin/pg_amcheck/ > > Argh, OK, I didn't realize. Should be fixed in this version. > >>> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong, >>> because we cannot compute a sensible expected size in that case. I >>> think your code will subtract a larger value from a smaller one and, >>> this being unsigned arithmetic, say that the expected chunk size is >>> something gigantic. >> >> Your conclusion is probably right, but I think your analysis is based on a >> misreading of what "last_chunk_seq" means. It's not the last one seen, but >> the last one expected. (Should we rename the variable to avoid confusion?) >> It won't compute a gigantic size. Rather, it will expect *every* chunk with >> chunk_seq >= last_chunk_seq to have whatever size is appropriate for the >> last chunk. > > I realize it's the last one expected. That's the point: we don't have > any expectation for the sizes of chunks higher than the last one we > expected to see. If the value is 2000 bytes and the chunk size is 1996 > bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes. > If not, we can complain. But it makes no sense to complain about chunk > 2 being of a size we don't expect. We don't expect it to exist in the > first place, so we have no notion of what size it ought to be. > >> If we have seen any chunks, the variable is holding the expected next chunk >> seq, which is one greater than the last chunk seq we saw. >> >> If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain >> ..."expected to end at chunk 4, but ended at chunk 1". This is clearly by >> design and not merely a bug, though I tend to agree with you that this is a >> strange wording choice. I can't remember exactly when and how we decided to >> word the message this way, but it has annoyed me for a while, and I assumed >> it was something you suggested a while back, because I don't recall doing >> it. Either way, since you seem to also be bothered by this, I agree we >> should change it. > > Can you review this version? > > -- > Robert Haas > EDB: http://www.enterprisedb.com > As requested off-list, here are NOT FOR COMMIT, WIP patches for testing only. The first patch allows toast tables to be updated and adds regression tests of corrupted toasted attributes. I never quite got deletes from toast tables to work, and there are probably other gotchas still lurking even with inserts and updates, but it limps along well enough for testing pg_amcheck. The second patch updates the expected output of pg_amcheck to match the verbiage that you suggested upthread. v1-0001-Adding-modify-toast-and-test-pg_amcheck.patch.WIP Description: Binary data v1-0002-Modifying-toast-corruption-test-expected-output.patch.WIP Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger wrote: > Just to be clear, I did not use your patch v1 as the starting point. I thought that might be the case, but I was trying to understand what you didn't like about my version, and comparing them seemed like a way to figure that out. > I took the code as committed to master as the starting point, used your > corruption report verbiage changes and at least some of your variable naming > choices, but did not use the rest, in large part because it didn't work. It > caused corruption messages to be reported against tables that have no > corruption. For that matter, your v2 patch doesn't work either, and in the > same way. To wit: > > heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, > attribute 7: > toast value 13461 chunk 0 has size 1995, but expected size 1996 > > I think there is something wrong with the way you are trying to calculate and > use extsize, because I'm not corrupting pg_catalog.pg_rewrite. You can get > these same results by applying your patch to master, building, and running > 'make check' from src/bin/pg_amcheck/ Argh, OK, I didn't realize. Should be fixed in this version. > > 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong, > > because we cannot compute a sensible expected size in that case. I > > think your code will subtract a larger value from a smaller one and, > > this being unsigned arithmetic, say that the expected chunk size is > > something gigantic. > > Your conclusion is probably right, but I think your analysis is based on a > misreading of what "last_chunk_seq" means. It's not the last one seen, but > the last one expected. (Should we rename the variable to avoid confusion?) > It won't compute a gigantic size. Rather, it will expect *every* chunk with > chunk_seq >= last_chunk_seq to have whatever size is appropriate for the last > chunk. I realize it's the last one expected. That's the point: we don't have any expectation for the sizes of chunks higher than the last one we expected to see. If the value is 2000 bytes and the chunk size is 1996 bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes. If not, we can complain. But it makes no sense to complain about chunk 2 being of a size we don't expect. We don't expect it to exist in the first place, so we have no notion of what size it ought to be. > If we have seen any chunks, the variable is holding the expected next chunk > seq, which is one greater than the last chunk seq we saw. > > If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain > ..."expected to end at chunk 4, but ended at chunk 1". This is clearly by > design and not merely a bug, though I tend to agree with you that this is a > strange wording choice. I can't remember exactly when and how we decided to > word the message this way, but it has annoyed me for a while, and I assumed > it was something you suggested a while back, because I don't recall doing it. > Either way, since you seem to also be bothered by this, I agree we should > change it. Can you review this version? -- Robert Haas EDB: http://www.enterprisedb.com simply-remove-chunkno-concept-v3.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 30, 2021, at 9:39 AM, Robert Haas wrote: > > On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger > wrote: >> The attached patch changes amcheck corruption reports as discussed upthread. >> This patch is submitted for the v14 development cycle as a bug fix, per >> your complaint that the committed code generates reports sufficiently >> confusing to a user as to constitute a bug. >> >> All other code refactoring and additional checks discussed upthread are >> reserved for the v15 development cycle and are not included here. >> >> The minimal patch (not attached) that does not rename any variables is 135 >> lines. Your patch was 159 lines. The patch (attached) which includes your >> variable renaming is 174 lines. > > Hi, > > I have compared this against my version. I found the following differences: Just to be clear, I did not use your patch v1 as the starting point. I took the code as committed to master as the starting point, used your corruption report verbiage changes and at least some of your variable naming choices, but did not use the rest, in large part because it didn't work. It caused corruption messages to be reported against tables that have no corruption. For that matter, your v2 patch doesn't work either, and in the same way. To wit: heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 7: toast value 13461 chunk 0 has size 1995, but expected size 1996 I think there is something wrong with the way you are trying to calculate and use extsize, because I'm not corrupting pg_catalog.pg_rewrite. You can get these same results by applying your patch to master, building, and running 'make check' from src/bin/pg_amcheck/ > 1. This version passes last_chunk_seq rather than extsize to > check_toast_tuple(). But this results in having to call > VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was > nicer to do that in the caller, so that we don't do it twice. I don't see that VARATT_EXTERNAL_GET_EXTSIZE() is worth too much concern, given that is just a struct access and a bit mask. You are avoiding calculating that twice, but at the expense of calculating last_chunk_seq twice, which involves division. I don't think the division can be optimized as a mere bit shift, since TOAST_MAX_CHUNK_SIZE is not in general a power of two. (For example, on my laptop it is 1996.) I don't say this to nitpick at the performance one way vs. the other. I doubt it makes any real difference. I'm just confused why you want to change this particular thing right now, given that it is not a bug. > 2. You fixed some out-of-date comments. Yes, because they were wrong. That's on me. I failed to update them in a prior patch. > 3. You move the test for an unexpected chunk sequence further down in > the function. I don't see the point; Relative to your patch, perhaps. Relative to master, no tests have been moved. > I had put it by the related null > check, and still think that's better. You also deleted my comment /* > Either the TOAST index is corrupt, or we don't have all chunks. */ > which I would have preferred to keep. That's fine. I didn't mean to remove it. I was just taking a minimalist approach to constructing the patch. > 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong, > because we cannot compute a sensible expected size in that case. I > think your code will subtract a larger value from a smaller one and, > this being unsigned arithmetic, say that the expected chunk size is > something gigantic. Your conclusion is probably right, but I think your analysis is based on a misreading of what "last_chunk_seq" means. It's not the last one seen, but the last one expected. (Should we rename the variable to avoid confusion?) It won't compute a gigantic size. Rather, it will expect *every* chunk with chunk_seq >= last_chunk_seq to have whatever size is appropriate for the last chunk. > Returning and not issuing that complaint at all > seems better. That might be best. I had been resisting that because I don't want the extraneous chunks to be reported without chunk size information. When debugging corrupted toast, it may be interesting to know the size of the extraneous chunks. If there are 1000 extra chunks, somebody might want to see the sizes of them. > 5. You fixed the incorrect formula I had introduced for the expected > size of the last chunk. Not really. I just didn't introduce any change in that area. > 6. You changed the variable name in check_toasted_attribute() from > expected_chunkno to chunkno, and initialized it later in the function > instead of at declaration time. I don't find this to be an > improvement; I think I just left the variable name and its initialization unchanged. > including the word "expected" seems to me to be > substantially clearer. But I think I should have gone with > expected_chunk_seq for better consistency. I agree that is a better name.
Re: pg_amcheck contrib application
On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger wrote: > The attached patch changes amcheck corruption reports as discussed upthread. > This patch is submitted for the v14 development cycle as a bug fix, per your > complaint that the committed code generates reports sufficiently confusing to > a user as to constitute a bug. > > All other code refactoring and additional checks discussed upthread are > reserved for the v15 development cycle and are not included here. > > The minimal patch (not attached) that does not rename any variables is 135 > lines. Your patch was 159 lines. The patch (attached) which includes your > variable renaming is 174 lines. Hi, I have compared this against my version. I found the following differences: 1. This version passes last_chunk_seq rather than extsize to check_toast_tuple(). But this results in having to call VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was nicer to do that in the caller, so that we don't do it twice. 2. You fixed some out-of-date comments. 3. You move the test for an unexpected chunk sequence further down in the function. I don't see the point; I had put it by the related null check, and still think that's better. You also deleted my comment /* Either the TOAST index is corrupt, or we don't have all chunks. */ which I would have preferred to keep. 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong, because we cannot compute a sensible expected size in that case. I think your code will subtract a larger value from a smaller one and, this being unsigned arithmetic, say that the expected chunk size is something gigantic. Returning and not issuing that complaint at all seems better. 5. You fixed the incorrect formula I had introduced for the expected size of the last chunk. 6. You changed the variable name in check_toasted_attribute() from expected_chunkno to chunkno, and initialized it later in the function instead of at declaration time. I don't find this to be an improvement; including the word "expected" seems to me to be substantially clearer. But I think I should have gone with expected_chunk_seq for better consistency. 7. You restored the message "toast value %u was expected to end at chunk %d, but ended at chunk %d" which my version deleted. I deleted that message because I thought it was redundant, but I guess it's not: there's nothing else to complain if the sequence of chunks ends early. I think we should change the test from != to < though, because if it's >, then we must have already complained about unexpected chunks. Also, I think the message is actually wrong, because even though you renamed the variable, it still ends up being the expected next chunkno rather than the last chunkno we actually saw. PFA my counter-proposal based on the above analysis. -- Robert Haas EDB: http://www.enterprisedb.com simply-remove-chunkno-concept-v2.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 3:01 PM, Mark Dilger wrote: > > I'll try to post something that accomplishes the changes to the reports that > you are looking for. The attached patch changes amcheck corruption reports as discussed upthread. This patch is submitted for the v14 development cycle as a bug fix, per your complaint that the committed code generates reports sufficiently confusing to a user as to constitute a bug. All other code refactoring and additional checks discussed upthread are reserved for the v15 development cycle and are not included here. The minimal patch (not attached) that does not rename any variables is 135 lines. Your patch was 159 lines. The patch (attached) which includes your variable renaming is 174 lines. v23-0001-amcheck-adjusting-corruption-report-output.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 1:31 PM, Robert Haas wrote: > > Perhaps something like this, closer to the way you had it? > > expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE > : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE); It still suffers the same failures. I'll try to post something that accomplishes the changes to the reports that you are looking for. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 23, 2021 at 2:36 PM Mark Dilger wrote: > > What's different? > > for one thing, if a sequence of chunks happens to fit perfectly, the final > chunk will have size TOAST_MAX_CHUNK_SIZE, but you're expecting no larger > than one less than that, given how modulo arithmetic works. Good point. Perhaps something like this, closer to the way you had it? expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE); -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 11:29 AM, Robert Haas wrote: > > + expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE > + : extsize % TOAST_MAX_CHUNK_SIZE; > > What's different? for one thing, if a sequence of chunks happens to fit perfectly, the final chunk will have size TOAST_MAX_CHUNK_SIZE, but you're expecting no larger than one less than that, given how modulo arithmetic works. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 11:29 AM, Robert Haas wrote: > > On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger > wrote: >> Another difference I should probably mention is that a bunch of unrelated >> tests are failing with errors like: >> >>toast value 13465 chunk 0 has size 1995, but expected size 1996 >> >> which leads me to suspect your changes to how the size is calculated. > > That seems like a pretty reasonable suspicion, but I can't see the problem: > > - expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE > - : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) - > (endchunk * TOAST_MAX_CHUNK_SIZE); > + expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE > + : extsize % TOAST_MAX_CHUNK_SIZE; > > What's different? > > 1. The variables are renamed. > > 2. It uses a new variable extsize instead of recomputing > VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that > should have the same value. > > 3. I used modulo arithmetic (%) instead of subtracting endchunk * > TOAST_MAX_CHUNK_SIZE. > > Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert? On my laptop, yes, 1996 is TOAST_MAX_CHUNK_SIZE. I'm not inserting anything. These failures come from just regular tests that I have not changed. I just applied your patch and ran `make check-world` and these fail in src/bin/pg_amcheck — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger wrote: > Another difference I should probably mention is that a bunch of unrelated > tests are failing with errors like: > > toast value 13465 chunk 0 has size 1995, but expected size 1996 > > which leads me to suspect your changes to how the size is calculated. That seems like a pretty reasonable suspicion, but I can't see the problem: - expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE - : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) - (endchunk * TOAST_MAX_CHUNK_SIZE); + expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE + : extsize % TOAST_MAX_CHUNK_SIZE; What's different? 1. The variables are renamed. 2. It uses a new variable extsize instead of recomputing VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that should have the same value. 3. I used modulo arithmetic (%) instead of subtracting endchunk * TOAST_MAX_CHUNK_SIZE. Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 11:05 AM, Mark Dilger > wrote: > > Here are the differences between master and you patch: Another difference I should probably mention is that a bunch of unrelated tests are failing with errors like: toast value 13465 chunk 0 has size 1995, but expected size 1996 which leads me to suspect your changes to how the size is calculated. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Apr 23, 2021 at 2:05 PM Mark Dilger wrote: > Here are the differences between master and you patch: Thanks. Those messages look reasonable to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 10:31 AM, Mark Dilger > wrote: > > I will test your patch and see what differs. Here are the differences between master and you patch: UPDATE $toastname SET chunk_seq = chunk_seq + 1000 WHERE chunk_id = $value_id_to_corrupt - qr/${header}toast value 16459 chunk 0 has sequence number 1000, but expected sequence number 0/, - qr/${header}toast value 16459 chunk 1 has sequence number 1001, but expected sequence number 1/, - qr/${header}toast value 16459 chunk 2 has sequence number 1002, but expected sequence number 2/, - qr/${header}toast value 16459 chunk 3 has sequence number 1003, but expected sequence number 3/, - qr/${header}toast value 16459 chunk 4 has sequence number 1004, but expected sequence number 4/, - qr/${header}toast value 16459 chunk 5 has sequence number 1005, but expected sequence number 5/; + qr/${header}toast value 16459 index scan returned chunk 1000 when expecting chunk 0/, + qr/${header}toast value 16459 chunk 1000 follows last expected chunk 5/, + qr/${header}toast value 16459 chunk 1001 follows last expected chunk 5/, + qr/${header}toast value 16459 chunk 1002 follows last expected chunk 5/, + qr/${header}toast value 16459 chunk 1003 follows last expected chunk 5/, + qr/${header}toast value 16459 chunk 1004 follows last expected chunk 5/, + qr/${header}toast value 16459 chunk 1005 follows last expected chunk 5/; UPDATE $toastname SET chunk_seq = chunk_seq * 1000 WHERE chunk_id = $value_id_to_corrupt - qr/${header}toast value $value_id_to_corrupt chunk 1 has sequence number 1000, but expected sequence number 1/, - qr/${header}toast value $value_id_to_corrupt chunk 2 has sequence number 2000, but expected sequence number 2/, - qr/${header}toast value $value_id_to_corrupt chunk 3 has sequence number 3000, but expected sequence number 3/, - qr/${header}toast value $value_id_to_corrupt chunk 4 has sequence number 4000, but expected sequence number 4/, - qr/${header}toast value $value_id_to_corrupt chunk 5 has sequence number 5000, but expected sequence number 5/; - + qr/${header}toast value 16460 index scan returned chunk 1000 when expecting chunk 1/, + qr/${header}toast value 16460 chunk 1000 follows last expected chunk 5/, + qr/${header}toast value 16460 index scan returned chunk 2000 when expecting chunk 1001/, + qr/${header}toast value 16460 chunk 2000 follows last expected chunk 5/, + qr/${header}toast value 16460 index scan returned chunk 3000 when expecting chunk 2001/, + qr/${header}toast value 16460 chunk 3000 follows last expected chunk 5/, + qr/${header}toast value 16460 index scan returned chunk 4000 when expecting chunk 3001/, + qr/${header}toast value 16460 chunk 4000 follows last expected chunk 5/, + qr/${header}toast value 16460 index scan returned chunk 5000 when expecting chunk 4001/, + qr/${header}toast value 16460 chunk 5000 follows last expected chunk 5/; INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data) (SELECT chunk_id, 10*chunk_seq + 1000, chunk_data FROM $toastname WHERE chunk_id = $value_id_to_corrupt) - qr/${header}toast value $value_id_to_corrupt chunk 6 has sequence number 1000, but expected sequence number 6/, - qr/${header}toast value $value_id_to_corrupt chunk 7 has sequence number 1010, but expected sequence number 7/, - qr/${header}toast value $value_id_to_corrupt chunk 8 has sequence number 1020, but expected sequence number 8/, - qr/${header}toast value $value_id_to_corrupt chunk 9 has sequence number 1030, but expected sequence number 9/, - qr/${header}toast value $value_id_to_corrupt chunk 10 has sequence number 1040, but expected sequence number 10/, - qr/${header}toast value $value_id_to_corrupt chunk 11 has sequence number 1050, but expected sequence number 11/, - qr/${header}toast value $value_id_to_corrupt was expected to end at chunk 6, but ended at chunk 12/; + qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1000 when expecting chunk 6/, + qr/${header}toast value $value_id_to_corrupt chunk 1000 follows last expected chunk 5/, + qr/${header}toast value $value_id_to_corrupt index scan returned chunk 1010 when expecting chunk 1001/, + qr/${header}toast value $value_id_to_corrupt chunk 1010 follows last expected chunk 5/, +
Re: pg_amcheck contrib application
> On Apr 23, 2021, at 10:28 AM, Robert Haas wrote: > > On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger > wrote: >> I have refactored the patch to address your other concerns. Breaking the >> patch into multiple pieces didn't add any clarity, but refactoring portions >> of it made things simpler to read, I think, so here it is as one patch file. > > I was hoping that this version was going to be smaller than the last > version, but instead it went from 300+ lines to 500+ lines. > > The main thing I'm unhappy about in the status quo is the use of > chunkno in error messages. I have suggested several times making that > concept go away, because I think users will be confused. Here's a > minimal patch that does just that. It's 32 lines and results in a net > removal of 4 lines. It differs somewhat from my earlier suggestions, > because my priority here is to get reasonably understandable output > without needing a ton of code, and as I was working on this I found > that some of my earlier suggestions would have needed more code to > implement and I didn't think it bought enough to be worth it. It's > possible this is too simple, or that it's buggy, so let me know what > you think. But basically, I think what got committed before is > actually mostly fine and doesn't need major revision. It just needs > tidying up to avoid the confusing chunkno concept. > > Now, the other thing we've talked about is adding a few more checks, > to verify for example that the toastrelid is what we expect, and I see > in your v22 you thought of a few other things. I think we can consider > those, possibly as things where we consider it tidying up loose ends > for v14, or else as improvements for v15. But I don't think that the > fairly large size of your patch comes primarily from additional > checks. I think it mostly comes from the code to produce error reports > getting a lot more complicated. I apologize if my comments have driven > that complexity, but they weren't intended to. > > One tiny problem with the attached patch is that it does not make any > regression tests fail, which also makes it hard for me to tell if it > breaks anything, or if the existing code works. I don't know how > practical it is to do anything about that. Do you have a patch handy > that allows manual updates and deletes on TOAST tables, for manual > testing purposes? Yes, I haven't been posting that with the patch because, but I will test your patch and see what differs. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger wrote: > I have refactored the patch to address your other concerns. Breaking the > patch into multiple pieces didn't add any clarity, but refactoring portions > of it made things simpler to read, I think, so here it is as one patch file. I was hoping that this version was going to be smaller than the last version, but instead it went from 300+ lines to 500+ lines. The main thing I'm unhappy about in the status quo is the use of chunkno in error messages. I have suggested several times making that concept go away, because I think users will be confused. Here's a minimal patch that does just that. It's 32 lines and results in a net removal of 4 lines. It differs somewhat from my earlier suggestions, because my priority here is to get reasonably understandable output without needing a ton of code, and as I was working on this I found that some of my earlier suggestions would have needed more code to implement and I didn't think it bought enough to be worth it. It's possible this is too simple, or that it's buggy, so let me know what you think. But basically, I think what got committed before is actually mostly fine and doesn't need major revision. It just needs tidying up to avoid the confusing chunkno concept. Now, the other thing we've talked about is adding a few more checks, to verify for example that the toastrelid is what we expect, and I see in your v22 you thought of a few other things. I think we can consider those, possibly as things where we consider it tidying up loose ends for v14, or else as improvements for v15. But I don't think that the fairly large size of your patch comes primarily from additional checks. I think it mostly comes from the code to produce error reports getting a lot more complicated. I apologize if my comments have driven that complexity, but they weren't intended to. One tiny problem with the attached patch is that it does not make any regression tests fail, which also makes it hard for me to tell if it breaks anything, or if the existing code works. I don't know how practical it is to do anything about that. Do you have a patch handy that allows manual updates and deletes on TOAST tables, for manual testing purposes? -- Robert Haas EDB: http://www.enterprisedb.com simply-remove-chunkno-concept.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 19, 2021, at 5:07 PM, Mark Dilger wrote: > > > >> On Apr 19, 2021, at 12:50 PM, Robert Haas wrote: >> >> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger >> wrote: >>> I have added the verb "has" rather than "contains" because "has" is more >>> consistent with the phrasing of other similar corruption reports. >> >> That makes sense. I have refactored the patch to address your other concerns. Breaking the patch into multiple pieces didn't add any clarity, but refactoring portions of it made things simpler to read, I think, so here it is as one patch file. v22-0001-amcheck-adding-toast-pointer-corruption-checks.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Apr 19, 2021, at 12:50 PM, Robert Haas wrote: > > On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger > wrote: >> I have added the verb "has" rather than "contains" because "has" is more >> consistent with the phrasing of other similar corruption reports. > > That makes sense. > > I think it's odd that a range of extraneous chunks is collapsed into a > single report if the size of each chunk happens to be > TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the > first and last extraneous chunk and the size of each? If the next > chunk you see is the next one in sequence and the same size as all the > others, extend your notion of the sequence end by 1. Otherwise, report > the range accumulated so far. It seems to me that this wouldn't be any > more code than you have now, and might actually be less. In all cases of uncorrupted toasted attributes, the sequence of N chunks that make up the attribute should be N-1 chunks of TOAST_MAX_CHUNK_SIZE ending with a single chunk of up to TOAST_MAX_CHUNK_SIZE. I'd like to refer to such sequences as "reasonably sized" sequences to make conversation easier. If the toast pointer's va_extsize field leads us to believe that we should find 10 reasonably sized chunks, but instead we find 30 reasonably sized chunks, we know something is corrupt. We shouldn't automatically prejudice the user against the additional 20 chunks. We didn't expect them, but maybe that's because va_extsize was corrupt and gave us a false expectation. We're not pointing fingers one way or the other. On the other hand, if we expect 10 chunks and find an additional 20 unreasonably sized chunks, we can and should point fingers at the extra 20 chunks. Even if we somehow knew that va_extsize was also corrupt, we'd still be justified in saying the 20 unreasonably sized chunks are each, individually corrupt. I tried to write the code to report one corruption message per corruption found. There are some edge cases where this is a definitional challenge, so it's not easy to say that I've always achieved this goal, but I think i've done so where the definitions are clear. As such, the only time I'd want to combine toast chunks into a single corruption message is when they are not in themselves necessarily *individually* corrupt. That is why I wrote the code to use TOAST_MAX_CHUNK_SIZE rather than just storing up any series of equally sized chunks. On a related note, when complaining about a sequence of toast chunks, often the sequence is something like [maximal, maximal, ..., maximal, partial], but sometimes it's just [maximal...maximal], sometimes just [maximal], and sometimes just [partial]. If I'm complaining about that entire sequence, I'd really like to do so in just one message, otherwise it looks like separate complaints. I can certainly change the code to be how you are asking, but I'd first like to know that you really understood what I was doing here and why the reports read the way they do. > I think that report_missing_chunks() could probably just report the > range of missing chunks and not bother reporting how big they were > expected to be. But, if it is going to report how big they were > expected to be, I think it should have only 2 cases rather than 4: > either a range of missing chunks of equal size, or a single missing > chunk of some size. If, as I propose, it doesn't report the expected > size, then you still have just 2 cases: a range of missing chunks, or > a single missing chunk. Right, this is the same as above. I'm trying not to split a single corruption complaint into separate reports. > Somehow I have a hard time feeling confident that check_toast_tuple() > is going to do the right thing. The logic is really complex and hard > to understand. 'chunkno' is a counter that advances every time we move > to the next chunk, and 'curchunk' is the value we actually find in the > TOAST tuple. This terminology is not easy to understand. Most messages > now report 'curchunk', but some still report 'chunkno'. Why does > 'chunkno' need to exist at all? AFAICS the combination of 'curchunk' > and 'tctx->last_chunk_seen' ought to be sufficient. I can see no > particular reason why what you're calling 'chunkno' needs to exist > even as a local variable, let alone be printed out. Either we haven't > yet validated that the chunk_id extracted from the tuple is non-null > and greater than the last chunk number we saw, in which case we can > just complain about it if we find it to be otherwise, or we have > already done that validation, in which case we should complain about > that value and not 'chunkno' in any subsequent messages. If we use tctx->last_chunk_seen as you propose, I imagine we'd set that to -1 prior to the first call to check_toast_tuple(). In the first call, we'd extract the toast chunk_seq and store it in curchunk and verify that it's one greater than tctx->last_chunk_seen. That all seems fine. But under corrupt
Re: pg_amcheck contrib application
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger wrote: > I have added the verb "has" rather than "contains" because "has" is more > consistent with the phrasing of other similar corruption reports. That makes sense. I think it's odd that a range of extraneous chunks is collapsed into a single report if the size of each chunk happens to be TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the first and last extraneous chunk and the size of each? If the next chunk you see is the next one in sequence and the same size as all the others, extend your notion of the sequence end by 1. Otherwise, report the range accumulated so far. It seems to me that this wouldn't be any more code than you have now, and might actually be less. I think that report_missing_chunks() could probably just report the range of missing chunks and not bother reporting how big they were expected to be. But, if it is going to report how big they were expected to be, I think it should have only 2 cases rather than 4: either a range of missing chunks of equal size, or a single missing chunk of some size. If, as I propose, it doesn't report the expected size, then you still have just 2 cases: a range of missing chunks, or a single missing chunk. Somehow I have a hard time feeling confident that check_toast_tuple() is going to do the right thing. The logic is really complex and hard to understand. 'chunkno' is a counter that advances every time we move to the next chunk, and 'curchunk' is the value we actually find in the TOAST tuple. This terminology is not easy to understand. Most messages now report 'curchunk', but some still report 'chunkno'. Why does 'chunkno' need to exist at all? AFAICS the combination of 'curchunk' and 'tctx->last_chunk_seen' ought to be sufficient. I can see no particular reason why what you're calling 'chunkno' needs to exist even as a local variable, let alone be printed out. Either we haven't yet validated that the chunk_id extracted from the tuple is non-null and greater than the last chunk number we saw, in which case we can just complain about it if we find it to be otherwise, or we have already done that validation, in which case we should complain about that value and not 'chunkno' in any subsequent messages. The conditionals between where you set max_valid_prior_chunk and where you set last_chunk_seen seem hard to understand, particularly the bifurcated way that missing chunks are reported. Initial missing chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0) and later missing chunks are detected by (chunkno > 0 && max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this is correct; I find it hard to get my head around what max_valid_prior_chunk is supposed to represent. But in any case I think it can be written more simply. Just keep track of what chunk_id we expect to extract from the next TOAST tuple. Initially it's 0. Then: if (chunkno < tctx->expected_chunkno) { // toast value %u index scan returned chunk number %d when chunk %d was expected // don't modify tctx->expected_chunkno here, just hope the next thing matches our previous expectation } else { if (chunkno > tctx->expected_chunkno) // chunks are missing from tctx->expected_chunkno through Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter value is greater than or equal to the former tctx->expected_chunkno = chunkno + 1; } If you do this, you only need to report extraneous chunks when chunkno > tctx->final_expected_chunk, since chunkno < 0 is guaranteed to trigger the first of the two complaints shown above. In check_tuple_attribute I suggest "bool valid = false" rather than "bool invalid = true". I think it's easier to understand. I object to check_toasted_attribute() using 'chunkno' in a message for the same reasons as above in regards to check_toast_tuple() i.e. I think it's a concept which should not exist. I think this patch could possibly be split up into multiple patches. There's some question in my mind whether it's getting too late to commit any of this, since some of it looks suspiciously like new features after feature freeze. However, I kind of hate to ship this release without at least doing something about the chunkno vs. curchunk stuff, which is even worse in the committed code than in your patch, and which I think will confuse the heck out of users if those messages actually fire for anyone. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 14, 2021, at 10:17 AM, Robert Haas wrote: > > On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger > wrote: >> It now reports: >> >> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: >> # toast value 16461 missing chunk 3 with expected size 1996 >> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: >> # toast value 16461 was expected to end at chunk 6 with total size >> 1, but ended at chunk 5 with total size 8004 >> >> It sounds like you weren't expecting the second of these reports. I think >> it is valuable, especially when there are multiple missing chunks and >> multiple extraneous chunks, as it makes it easier for the user to reconcile >> the missing chunks against the extraneous chunks. > > I wasn't, but I'm not overwhelmingly opposed to it, either. I do think > I would be in favor of splitting this kind of thing up into two > messages: > > # toast value 16459 unexpected chunks 1000 through 1004 each with > size 1996 followed by chunk 1005 with size 20 > > We'll have fewer message variants and I don't think any real > regression in usability if we say: > > # toast value 16459 has unexpected chunks 1000 through 1004 each > with size 1996 > # toast value 16459 has unexpected chunk 1005 with size 20 Changed. > (Notice that I also inserted "has" so that the sentence a verb. Or we > could "contains.") I have added the verb "has" rather than "contains" because "has" is more consistent with the phrasing of other similar corruption reports. v21-0001-amcheck-adding-toast-pointer-corruption-checks.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger wrote: > It now reports: > > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > # toast value 16461 missing chunk 3 with expected size 1996 > # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: > # toast value 16461 was expected to end at chunk 6 with total size 1, > but ended at chunk 5 with total size 8004 > > It sounds like you weren't expecting the second of these reports. I think it > is valuable, especially when there are multiple missing chunks and multiple > extraneous chunks, as it makes it easier for the user to reconcile the > missing chunks against the extraneous chunks. I wasn't, but I'm not overwhelmingly opposed to it, either. I do think I would be in favor of splitting this kind of thing up into two messages: # toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 followed by chunk 1005 with size 20 We'll have fewer message variants and I don't think any real regression in usability if we say: # toast value 16459 has unexpected chunks 1000 through 1004 each with size 1996 # toast value 16459 has unexpected chunk 1005 with size 20 (Notice that I also inserted "has" so that the sentence a verb. Or we could "contains.") I committed 0001. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 9, 2021, at 1:51 PM, Robert Haas wrote: > > On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger > wrote: >> I think #4, above, requires some clarification. If there are missing >> chunks, the very definition of how large we expect subsequent chunks to be >> is ill-defined. I took a fairly conservative approach to avoid lots of >> bogus complaints about chunks that are of unexpected size. Not all such >> complaints are removed, but enough are removed that I needed to add a final >> complaint at the end about the total size seen not matching the total size >> expected. > > My instinct is to suppose that the size that we expect for future > chunks is independent of anything being wrong with previous chunks. So > if each chunk is supposed to be 2004 bytes (which probably isn't the > real number) and the value is 7000 bytes long, we expect chunks 0-2 to > be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to > not exist. If chunk 1 happens to be missing or the wrong length or > whatever, our expectations for chunks 2 and 3 are utterly unchanged. Fair enough. >> Corruption #1: >> >>UPDATE $toastname SET chunk_seq = chunk_seq + 1000 >> >> Before: >> >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 0 has sequence number 1000, but expected >> sequence number 0 >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 1 has sequence number 1001, but expected >> sequence number 1 >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 2 has sequence number 1002, but expected >> sequence number 2 >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 3 has sequence number 1003, but expected >> sequence number 3 >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 4 has sequence number 1004, but expected >> sequence number 4 >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 chunk 5 has sequence number 1005, but expected >> sequence number 5 >> >> After: >> >> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: >> # toast value 16445 missing chunks 0 through 999 > > Applying the above principle would lead to complaints that chunks 0-5 > are missing, and 1000-1005 are extra. That sounds right. It now reports: # heap table "postgres"."public"."test", block 0, offset 16, attribute 2: # toast value 16459 missing chunks 0 through 4 with expected size 1996 and chunk 5 with expected size 20 # heap table "postgres"."public"."test", block 0, offset 16, attribute 2: # toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 followed by chunk 1005 with size 20 >> Corruption #2: >> >>UPDATE $toastname SET chunk_seq = chunk_seq * 1000 > > Similarly here, except the extra chunk numbers are different. It now reports: # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 missing chunks 1 through 4 with expected size 1996 and chunk 5 with expected size 20 # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 unexpected chunk 1000 with size 1996 # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 unexpected chunk 2000 with size 1996 # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 unexpected chunk 3000 with size 1996 # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 unexpected chunk 4000 with size 1996 # heap table "postgres"."public"."test", block 0, offset 17, attribute 2: # toast value 16460 unexpected chunk 5000 with size 20 I don't see any good way in this case to report the extra chunks in one row, as in the general case there could be arbitrarily many of them, with the message text getting arbitrarily large if it reported the chunks as "chunks 1000, 2000, 3000, 4000, 5000, ...". I don't expect this sort of corruption being particularly common, though, so I'm not too bothered about it. > >> Corruption #3: >> >>UPDATE $toastname SET chunk_id = (chunk_id::integer + 1000)::oid >> WHERE chunk_seq = 3 > > And here we'd just get a complaint that chunk 3 is missing. It now reports: # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: # toast value 16461 missing chunk 3 with expected size 1996 # heap table "postgres"."public"."test", block 0, offset 18, attribute 2: # toast value 16461 was expected to end at chunk 6 with total size 1, but ended at chunk 5 with total size 8004 It sounds like you weren't expecting the second of these reports. I think it is valuable, especially when there are multiple
Re: pg_amcheck contrib application
On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger wrote: > I think #4, above, requires some clarification. If there are missing chunks, > the very definition of how large we expect subsequent chunks to be is > ill-defined. I took a fairly conservative approach to avoid lots of bogus > complaints about chunks that are of unexpected size. Not all such > complaints are removed, but enough are removed that I needed to add a final > complaint at the end about the total size seen not matching the total size > expected. My instinct is to suppose that the size that we expect for future chunks is independent of anything being wrong with previous chunks. So if each chunk is supposed to be 2004 bytes (which probably isn't the real number) and the value is 7000 bytes long, we expect chunks 0-2 to be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to not exist. If chunk 1 happens to be missing or the wrong length or whatever, our expectations for chunks 2 and 3 are utterly unchanged. > Corruption #1: > > UPDATE $toastname SET chunk_seq = chunk_seq + 1000 > > Before: > > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 0 has sequence number 1000, but expected > sequence number 0 > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 1 has sequence number 1001, but expected > sequence number 1 > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 2 has sequence number 1002, but expected > sequence number 2 > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 3 has sequence number 1003, but expected > sequence number 3 > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 4 has sequence number 1004, but expected > sequence number 4 > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 chunk 5 has sequence number 1005, but expected > sequence number 5 > > After: > > # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: > # toast value 16445 missing chunks 0 through 999 Applying the above principle would lead to complaints that chunks 0-5 are missing, and 1000-1005 are extra. > Corruption #2: > > UPDATE $toastname SET chunk_seq = chunk_seq * 1000 Similarly here, except the extra chunk numbers are different. > Corruption #3: > > UPDATE $toastname SET chunk_id = (chunk_id::integer + 1000)::oid > WHERE chunk_seq = 3 And here we'd just get a complaint that chunk 3 is missing. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 8, 2021, at 3:11 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger > wrote: >> All this leads me to believe that we should report the following: >> >> 1) If the total number of chunks retrieved differs from the expected number, >> report how many we expected vs. how many we got >> 2) If the chunk_seq numbers are discontiguous, report each discontiguity. >> 3) If the index scan returned chunks out of chunk_seq order, report that >> 4) If any chunk is not the expected size, report that >> >> So, for your example of chunk 1 missing from chunks [0..17], we'd report >> that we got one fewer chunks than we expected, that the second chunk seen >> was discontiguous from the first chunk seen, that the final chunk seen was >> smaller than expected by M bytes, and that the total size was smaller than >> we expected by N bytes. The third of those is somewhat misleading, since >> the final chunk was presumably the right size; we just weren't expecting to >> hit a partial chunk quite yet. But I don't see how to make that better in >> the general case. > > Hmm, that might be OK. It seems like it's going to be a bit verbose in > simple cases like 1 missing chunk, but on the plus side, it avoids a > mountain of output if the raw size has been overwritten with a > gigantic bogus value. But, how is #2 different from #3? Those sound > like the same thing to me. I think #4, above, requires some clarification. If there are missing chunks, the very definition of how large we expect subsequent chunks to be is ill-defined. I took a fairly conservative approach to avoid lots of bogus complaints about chunks that are of unexpected size. Not all such complaints are removed, but enough are removed that I needed to add a final complaint at the end about the total size seen not matching the total size expected. Here are a set of corruptions with the corresponding corruption reports from before and from after the code changes. The corruptions are *not* cumulative. Honestly, I'm not totally convinced that these changes are improvements in all cases. Let me know if you want further changes, or if you'd like to see other corruptions and their before and after results. Corruption #1: UPDATE $toastname SET chunk_seq = chunk_seq + 1000 Before: # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 0 has sequence number 1000, but expected sequence number 0 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 1 has sequence number 1001, but expected sequence number 1 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 2 has sequence number 1002, but expected sequence number 2 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 3 has sequence number 1003, but expected sequence number 3 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 4 has sequence number 1004, but expected sequence number 4 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 5 has sequence number 1005, but expected sequence number 5 After: # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 0 through 999 Corruption #2: UPDATE $toastname SET chunk_seq = chunk_seq * 1000 Before: # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 1 has sequence number 1000, but expected sequence number 1 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 2 has sequence number 2000, but expected sequence number 2 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 3 has sequence number 3000, but expected sequence number 3 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 4 has sequence number 4000, but expected sequence number 4 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 chunk 5 has sequence number 5000, but expected sequence number 5 After: # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 1 through 999 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 1001 through 1999 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 2001 through 2999 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 3001 through 3999 # heap table "postgres"."public"."test", block 0, offset 2, attribute 2: # toast value 16445 missing chunks 4001 through 4999 #
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 6:51 PM Mark Dilger wrote: > #2 is if chunk_seq goes up but skips numbers. #3 is if chunk_seq ever goes > down, meaning the index scan did something unexpected. Yeah, sure. But I think we could probably treat those the same way. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 8, 2021, at 3:11 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger > wrote: >> All this leads me to believe that we should report the following: >> >> 1) If the total number of chunks retrieved differs from the expected number, >> report how many we expected vs. how many we got >> 2) If the chunk_seq numbers are discontiguous, report each discontiguity. >> 3) If the index scan returned chunks out of chunk_seq order, report that >> 4) If any chunk is not the expected size, report that >> >> So, for your example of chunk 1 missing from chunks [0..17], we'd report >> that we got one fewer chunks than we expected, that the second chunk seen >> was discontiguous from the first chunk seen, that the final chunk seen was >> smaller than expected by M bytes, and that the total size was smaller than >> we expected by N bytes. The third of those is somewhat misleading, since >> the final chunk was presumably the right size; we just weren't expecting to >> hit a partial chunk quite yet. But I don't see how to make that better in >> the general case. > > Hmm, that might be OK. It seems like it's going to be a bit verbose in > simple cases like 1 missing chunk, but on the plus side, it avoids a > mountain of output if the raw size has been overwritten with a > gigantic bogus value. But, how is #2 different from #3? Those sound > like the same thing to me. #2 is if chunk_seq goes up but skips numbers. #3 is if chunk_seq ever goes down, meaning the index scan did something unexpected. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger wrote: > All this leads me to believe that we should report the following: > > 1) If the total number of chunks retrieved differs from the expected number, > report how many we expected vs. how many we got > 2) If the chunk_seq numbers are discontiguous, report each discontiguity. > 3) If the index scan returned chunks out of chunk_seq order, report that > 4) If any chunk is not the expected size, report that > > So, for your example of chunk 1 missing from chunks [0..17], we'd report that > we got one fewer chunks than we expected, that the second chunk seen was > discontiguous from the first chunk seen, that the final chunk seen was > smaller than expected by M bytes, and that the total size was smaller than we > expected by N bytes. The third of those is somewhat misleading, since the > final chunk was presumably the right size; we just weren't expecting to hit a > partial chunk quite yet. But I don't see how to make that better in the > general case. Hmm, that might be OK. It seems like it's going to be a bit verbose in simple cases like 1 missing chunk, but on the plus side, it avoids a mountain of output if the raw size has been overwritten with a gigantic bogus value. But, how is #2 different from #3? Those sound like the same thing to me. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 8, 2021, at 1:05 PM, Robert Haas wrote: > > On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger > wrote: >> Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update >> the toast to have only 6 chunks numbered [0..5] except we corruptly keep >> chunks numbered [12..17] in the toast table. We'd rather see a report like >> this: > [ toast value NNN chunk NNN has sequence number NNN, but expected > sequence number NNN ] >> than one like this: > [ toast value NNN contains chunk NNN where chunk NNN was expected ] >> >> because saying the toast value ended at "chunk 12" after saying that it >> contains "chunk 17" is contradictory. You need the distinction between the >> chunk number and the chunk sequence number, since in corrupt circumstances >> they may not be the same. > > Hmm, I see your point, and that's a good example to illustrate it. > But, with that example in front of me, I am rather doubtful that > either of these is what users actually want. Consider the case where I > should have chunks 0..17 and chunk 1 is just plain gone. This, by the > way, seems like a pretty likely case to arise in practice, since all > we need is for a block to get truncated away or zeroed erroneously, or > for a tuple to get pruned that shouldn't. With either of the above > schemes, I guess we're going to get a message about every chunk from 2 > to 17, complaining that they're all misnumbered. We might also get a > complaint that the last chunk is the wrong size, and that the total > number of chunks isn't right. What we really want is a single > complaint saying chunk 1 is missing. > > Likewise, in your example, I sort of feel like what I really want, > rather than either of the above outputs, is to get some messages like > this: > > toast value NNN contains unexpected extra chunk [12-17] > > Both your phrasing for those messages and what I suggested make it > sound like the problem is that the chunk number is wrong. But that > doesn't seem like it's taking the right view of the situation. Chunks > 12-17 shouldn't exist at all, and if they do, we should say that, e.g. > by complaining about something like "toast value 16444 chunk 12 > follows last expected chunk 5" > > In other words, I don't buy the idea that the user will accept the > idea that there's a chunk number and a chunk sequence number, and that > they should know the difference between those things and what each of > them are. They're entitled to imagine that there's just one thing, and > that we're going to tell them about value that are extra or missing. > The fact that we're not doing that seems like it's just a matter of > missing code. Somehow, we have to get enough information about chunk_seq discontinuity into the output that if the user forwards it to -hackers, or if the output comes from a buildfarm critter, that we have all the information to help diagnose what went wrong. As a specific example, if the va_rawsize suggests 2 chunks, and we find 150 chunks all with contiguous chunk_seq values, that is different from a debugging point of view than if we find 150 chunks with chunk_seq values spread all over the [0..MAXINT] range. We can't just tell the user that there were 148 extra chunks. We also shouldn't phrase the error in terms of "extra chunks", since it might be the va_rawsize that is corrupt. I agree that the current message output might be overly verbose in how it reports this information. Conceptually, we want to store up information about the chunk issues and report them all at once, but that's hard to do in general, as the number of chunk_seq discontinuities might be quite large, much too large to fit reasonably into any one message. Maybe we could report just the first N discontinuities rather than all of them, but if somebody wants to open a hex editor and walk through the toast table, they won't appreciate having the corruption information truncated like that. All this leads me to believe that we should report the following: 1) If the total number of chunks retrieved differs from the expected number, report how many we expected vs. how many we got 2) If the chunk_seq numbers are discontiguous, report each discontiguity. 3) If the index scan returned chunks out of chunk_seq order, report that 4) If any chunk is not the expected size, report that So, for your example of chunk 1 missing from chunks [0..17], we'd report that we got one fewer chunks than we expected, that the second chunk seen was discontiguous from the first chunk seen, that the final chunk seen was smaller than expected by M bytes, and that the total size was smaller than we expected by N bytes. The third of those is somewhat misleading, since the final chunk was presumably the right size; we just weren't expecting to hit a partial chunk quite yet. But I don't see how to make that better in the general case. > If we start the index scan and get chunk 4, we can > easily emit messages for chunks 0..3
Re: pg_amcheck contrib application
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger wrote: > Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update > the toast to have only 6 chunks numbered [0..5] except we corruptly keep > chunks numbered [12..17] in the toast table. We'd rather see a report like > this: [ toast value NNN chunk NNN has sequence number NNN, but expected sequence number NNN ] > than one like this: [ toast value NNN contains chunk NNN where chunk NNN was expected ] > > because saying the toast value ended at "chunk 12" after saying that it > contains "chunk 17" is contradictory. You need the distinction between the > chunk number and the chunk sequence number, since in corrupt circumstances > they may not be the same. Hmm, I see your point, and that's a good example to illustrate it. But, with that example in front of me, I am rather doubtful that either of these is what users actually want. Consider the case where I should have chunks 0..17 and chunk 1 is just plain gone. This, by the way, seems like a pretty likely case to arise in practice, since all we need is for a block to get truncated away or zeroed erroneously, or for a tuple to get pruned that shouldn't. With either of the above schemes, I guess we're going to get a message about every chunk from 2 to 17, complaining that they're all misnumbered. We might also get a complaint that the last chunk is the wrong size, and that the total number of chunks isn't right. What we really want is a single complaint saying chunk 1 is missing. Likewise, in your example, I sort of feel like what I really want, rather than either of the above outputs, is to get some messages like this: toast value NNN contains unexpected extra chunk [12-17] Both your phrasing for those messages and what I suggested make it sound like the problem is that the chunk number is wrong. But that doesn't seem like it's taking the right view of the situation. Chunks 12-17 shouldn't exist at all, and if they do, we should say that, e.g. by complaining about something like "toast value 16444 chunk 12 follows last expected chunk 5" In other words, I don't buy the idea that the user will accept the idea that there's a chunk number and a chunk sequence number, and that they should know the difference between those things and what each of them are. They're entitled to imagine that there's just one thing, and that we're going to tell them about value that are extra or missing. The fact that we're not doing that seems like it's just a matter of missing code. If we start the index scan and get chunk 4, we can easily emit messages for chunks 0..3 right on the spot, declaring them missing. Things do get a bit hairy if the index scan returns values out of order: what if it gives us chunk_seq = 2 and then chunk_seq = 1? But I think we could handle that by just issuing a complaint in any such case that "toast index returns chunks out of order for toast value NNN" and stopping further checking of that toast value. > Purely as manual testing, and not part of the patch, I hacked the backend a > bit to allow direct modification of the toast table. After corrupting the > toast with the following bit of SQL: > > WITH chunk_limit AS ( > SELECT chunk_id, MAX(chunk_seq) AS maxseq > FROM $toastname > GROUP BY chunk_id) > INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data) > (SELECT t.chunk_id, > t.chunk_seq + cl.maxseq + CASE WHEN > t.chunk_seq < 3 THEN 1 ELSE 7 END, > t.chunk_data > FROM $toastname t > INNER JOIN chunk_limit cl > ON t.chunk_id = cl.chunk_id) > > pg_amcheck reports the following corruption messages: > > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 6 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 7 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 8 follows last expected chunk 5 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 9 has sequence number 15, but expected sequence > number 9 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 10 has sequence number 16, but expected > sequence number 10 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 chunk 11 has sequence number 17, but expected > sequence number 11 > # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: > # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 > > I think if we'd left out the
Re: pg_amcheck contrib application
> On Apr 7, 2021, at 1:16 PM, Robert Haas wrote: > > On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger > wrote: >> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked >> owing to how I had separated the changes in v17-0002 vs. v17-0003 > > Committed. Thank you. >> v18-0002 - Postpones the toast checks for a page until after the main table >> page lock is released > > Committed, but I changed list_free() to list_free_deep() to avoid a > memory leak, and I revised the commit message to mention the important > point that we need to avoid following TOAST pointers from > potentially-prunable tuples. Thank you, and yes, I agree with that change. >> v18-0003 - Improves the corruption messages in ways already discussed >> earlier in this thread. Changes the tests to expect the new messages, but >> adds no new checks > > Kibitizing your message wording: > > "toast value %u chunk data is null" -> "toast value %u chunk %d has > null data". We can mention the chunk number this way. Changed. > "toast value %u corrupt extended chunk has invalid varlena header: %0x > (sequence number %d)" -> "toast value %u chunk %d has invalid varlena > header %0x". We can be more consistent about how we incorporate the > chunk number into the text, and we don't really need to include the > word corrupt, because all of these are corruption complaints, and I > think it looks better without the colon. Changed. > "toast value %u chunk sequence number %u does not match the expected > sequence number %u" -> "toast value %u contains chunk %d where chunk > %d was expected". Shorter. Uses %d for a sequence number instead of > %u, which I think is correct -- anyway we should have them all one way > or all the other. I think I'd rather ditch the "sequence number" > technology and just talk about "chunk %d" or whatever. I don't agree with this one. I do agree with changing the message, but not to the message you suggest. Imagine a toasted attribute with 18 chunks numbered [0..17]. Then we update the toast to have only 6 chunks numbered [0..5] except we corruptly keep chunks numbered [12..17] in the toast table. We'd rather see a report like this: # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 6 has sequence number 12, but expected sequence number 6 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 7 has sequence number 13, but expected sequence number 7 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 8 has sequence number 14, but expected sequence number 8 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 9 has sequence number 15, but expected sequence number 9 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 10 has sequence number 16, but expected sequence number 10 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 chunk 11 has sequence number 17, but expected sequence number 11 # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 than one like this: # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 12 where chunk 6 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 13 where chunk 7 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 14 where chunk 8 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 15 where chunk 9 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 16 where chunk 10 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 contains chunk 17 where chunk 11 was expected # heap table "postgres"."public"."test", block 0, offset 1, attribute 2: # toast value 16444 was expected to end at chunk 6, but ended at chunk 12 because saying the toast value ended at "chunk 12" after saying that it contains "chunk 17" is contradictory. You need the distinction between the chunk number and the chunk sequence number, since in corrupt circumstances they may not be the same. > "toast value %u chunk sequence number %u exceeds the end chunk > sequence number %u" -> "toast value %u chunk %d follows last expected > chunk %d" Changed. > "toast value %u chunk size %u differs from the expected size %u" -> > "toast value %u chunk %d has size %u, but expected size %u" Changed. > Other complaints: > > Your commit message fails to mention the addition of >
Re: pg_amcheck contrib application
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger wrote: > v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked > owing to how I had separated the changes in v17-0002 vs. v17-0003 Committed. > v18-0002 - Postpones the toast checks for a page until after the main table > page lock is released Committed, but I changed list_free() to list_free_deep() to avoid a memory leak, and I revised the commit message to mention the important point that we need to avoid following TOAST pointers from potentially-prunable tuples. > v18-0003 - Improves the corruption messages in ways already discussed earlier > in this thread. Changes the tests to expect the new messages, but adds no > new checks Kibitizing your message wording: "toast value %u chunk data is null" -> "toast value %u chunk %d has null data". We can mention the chunk number this way. "toast value %u corrupt extended chunk has invalid varlena header: %0x (sequence number %d)" -> "toast value %u chunk %d has invalid varlena header %0x". We can be more consistent about how we incorporate the chunk number into the text, and we don't really need to include the word corrupt, because all of these are corruption complaints, and I think it looks better without the colon. "toast value %u chunk sequence number %u does not match the expected sequence number %u" -> "toast value %u contains chunk %d where chunk %d was expected". Shorter. Uses %d for a sequence number instead of %u, which I think is correct -- anyway we should have them all one way or all the other. I think I'd rather ditch the "sequence number" technology and just talk about "chunk %d" or whatever. "toast value %u chunk sequence number %u exceeds the end chunk sequence number %u" -> "toast value %u chunk %d follows last expected chunk %d" "toast value %u chunk size %u differs from the expected size %u" -> "toast value %u chunk %d has size %u, but expected size %u" Other complaints: Your commit message fails to mention the addition of VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix unrelated to message wording. It feels like we have a non-minimal number of checks/messages for the series of toast chunks. I think that if we find a chunk after the last chunk we were expecting to find (curchunk > endchunk) and you also get a message if we have the wrong number of chunks in total (chunkno != (endchunk + 1)). Now maybe I'm wrong, but if the first message triggers, it seems like the second message must also trigger. Is that wrong? If not, maybe we can get rid of the first one entirely? That's such a small change I think we could include it in this same patch, if it's a correct idea. On a related note, as I think I said before, I still think we should be rejiggering this so that we're not testing both the size of each individual chunk and the total size, because that ought to be redundant. That might be better done as a separate patch but I think we should try to clean it up. > v18-0004 - Adding corruption checks of toast pointers. Extends the > regression tests to cover the new checks. I think we could check that the result of VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to see. Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize is 0x3FFF in no small part because that's the maximum length that can be represented by a varlena, but I'm not sure it's a good idea to couple the concepts so closely like this. Maybe we can just #define VARLENA_SIZE_LIMIT in this file and use that, and a message that says size %u exceeds limit %u. I'm a little worried about whether the additional test cases are Endian-dependent at all. I don't immediately know what might be wrong with them, but I'm going to think about that some more later. Any chance you have access to a Big-endian box where you can test this? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 1, 2021, at 1:08 PM, Robert Haas wrote: > > > > - There are a total of two (2) calls in the current source code to > palloc0fast, and hundreds of calls to palloc0. So I think you should > forget about using the fast variant and just do what almost every > other caller does. Done. > - If you want to make this code faster, a better idea would be to > avoid doing all of this allocate and free work and just allocate an > array that's guaranteed to be big enough, and then keep track of how > many elements of that array are actually in use. Sounds like premature optimization to me. I only used palloc0fast because the argument is compile-time known. I wasn't specifically attempting to speed anything up. > - #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of > introducing such a feature. Either we do it for real and expose it via > SQL and pg_amcheck as an optional behavior, or we rip it out and > revisit it later. Given the nearness of feature freeze, my vote is for > the latter. > > - I'd remove the USE_LZ4 bit, too. Let's not define the presence of > LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then > people will expect to be able to use this to find places where they > are dependent on LZ4 if they want to move away from it -- and if we > don't recurse into composite datums, that will not actually work. Ok, I have removed this bit. I also removed the part of the patch that introduced a new corruption check, decompressing the data to see if it decompresses without error. > - check_toast_tuple() has an odd and slightly unclear calling > convention for which there are no comments. I wonder if it would be > better to reverse things and make bool *error the return value and > what is now the return value into a pointer argument, but whatever we > do I think it needs a few words in the comments. We don't need to > slavishly explain every argument -- I think toasttup and ctx and tctx > are reasonably clear -- but this is not. ... > - Is there a reason we need a cross-check on both the number of chunks > and on the total size? It seems to me that we should check that each > individual chunk has the size we expect, and that the total number of > chunks is what we expect. The overall size is then necessarily > correct. Good point. I've removed the extra check on the total size, since it cannot be wrong if the checks on the individual chunk sizes were all correct. This eliminates the need for the odd calling convention for check_toast_tuple(), so I've changed that to return void and not take any return-by-reference arguments. > - To me it would be more logical to reverse the order of the > toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and > VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize > - VARHDRSZ checks. Whether we're pointing at the correct relation > feels more fundamental. Done. > - If we moved the toplevel foreach loop in check_toasted_attributes() > out to the caller, say renaming the function to just > check_toasted_attribute(), we'd save a level of indentation in that > whole function and probably add a tad of clarity, too. You wouldn't > feel the need to Assert(ctx.toasted_attributes == NIL) in the caller > if the caller had just done list_free(ctx->toasted_attributes); > ctx->toasted_attributes = NIL. You're right. It looks nicer that way. Changed. > - Why are all the changes to the tests in this patch? What do they > have to do with getting the TOAST checks out from under the buffer > lock? I really need you to structure the patch series so that each > patch is about one topic and, equally, so that each topic is only > covered by one patch. Otherwise it's just way too confusing. v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing to how I had separated the changes in v17-0002 vs. v17-0003 v18-0002 - Postpones the toast checks for a page until after the main table page lock is released v18-0003 - Improves the corruption messages in ways already discussed earlier in this thread. Changes the tests to expect the new messages, but adds no new checks v18-0004 - Adding corruption checks of toast pointers. Extends the regression tests to cover the new checks. > > - I think some of these messages need a bit of word-smithing, too, but > we can leave that for when we're closer to being done with this. Ok. v18-0001-amcheck-remove-duplicate-xid-bounds-checks.patch Description: Binary data v18-0002-amcheck-avoid-extra-work-while-holding-buffer-lo.patch Description: Binary data v18-0003-amcheck-improving-corruption-messages.patch Description: Binary data v18-0004-amcheck-adding-toast-pointer-corruption-checks.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Apr 1, 2021 at 1:41 PM Robert Haas wrote: > OK, committed. We still need to deal with what you had as 0003 > upthread, so I guess the next step is for me to spend some time > reviewing that one. I did this, and it was a bit depressing. It appears that we now have duplicate checks for xmin and xmax being out of the valid range. Somehow you have the removal of those duplicate checks in 0003, but why in the world didn't you put them into one of the previous patches and just make 0003 about fixing the holding-buffer-lock-while-following-TOAST-pointers problem? (And, gah, why did I not look carefully enough to notice that you hadn't done that?) Other than that, I notice a few other things: - There are a total of two (2) calls in the current source code to palloc0fast, and hundreds of calls to palloc0. So I think you should forget about using the fast variant and just do what almost every other caller does. - If you want to make this code faster, a better idea would be to avoid doing all of this allocate and free work and just allocate an array that's guaranteed to be big enough, and then keep track of how many elements of that array are actually in use. - #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of introducing such a feature. Either we do it for real and expose it via SQL and pg_amcheck as an optional behavior, or we rip it out and revisit it later. Given the nearness of feature freeze, my vote is for the latter. - I'd remove the USE_LZ4 bit, too. Let's not define the presence of LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then people will expect to be able to use this to find places where they are dependent on LZ4 if they want to move away from it -- and if we don't recurse into composite datums, that will not actually work. - check_toast_tuple() has an odd and slightly unclear calling convention for which there are no comments. I wonder if it would be better to reverse things and make bool *error the return value and what is now the return value into a pointer argument, but whatever we do I think it needs a few words in the comments. We don't need to slavishly explain every argument -- I think toasttup and ctx and tctx are reasonably clear -- but this is not. - To me it would be more logical to reverse the order of the toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize - VARHDRSZ checks. Whether we're pointing at the correct relation feels more fundamental. - If we moved the toplevel foreach loop in check_toasted_attributes() out to the caller, say renaming the function to just check_toasted_attribute(), we'd save a level of indentation in that whole function and probably add a tad of clarity, too. You wouldn't feel the need to Assert(ctx.toasted_attributes == NIL) in the caller if the caller had just done list_free(ctx->toasted_attributes); ctx->toasted_attributes = NIL. - Is there a reason we need a cross-check on both the number of chunks and on the total size? It seems to me that we should check that each individual chunk has the size we expect, and that the total number of chunks is what we expect. The overall size is then necessarily correct. - Why are all the changes to the tests in this patch? What do they have to do with getting the TOAST checks out from under the buffer lock? I really need you to structure the patch series so that each patch is about one topic and, equally, so that each topic is only covered by one patch. Otherwise it's just way too confusing. - I think some of these messages need a bit of word-smithing, too, but we can leave that for when we're closer to being done with this. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
On Thu, Apr 1, 2021 at 1:24 PM Mark Dilger wrote: > > On Apr 1, 2021, at 10:20 AM, Robert Haas wrote: > > OK, let's try that again. > Looks good! OK, committed. We still need to deal with what you had as 0003 upthread, so I guess the next step is for me to spend some time reviewing that one. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Apr 1, 2021, at 10:20 AM, Robert Haas wrote: > > On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger > wrote: >> Seems fine other than the typo. > > OK, let's try that again. Looks good! — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger wrote: > Seems fine other than the typo. OK, let's try that again. -- Robert Haas EDB: http://www.enterprisedb.com v17-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 1, 2021, at 9:56 AM, Robert Haas wrote: > > On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger > wrote: >>> - If xmax is a multi but seems to be garbled, I changed it to return >>> true rather than false. The inserter is known to have committed by >>> that point, so I think it's OK to try to deform the tuple. We just >>> shouldn't try to check TOAST. >> >> It is hard to know what to do when at least one tuple header field is >> corrupt. You don't necesarily know which one it is. For example, if >> HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it >> is out of bounds, we report it as corrupt. But was the xmax corrupt? Or >> was the HEAP_XMAX_IS_MULTI bit corrupt? It's not clear. I took the view >> that if either xmin or xmax appear to be corrupt when interpreted in light >> of the various tuple header bits, all we really know is that the set of >> fields/bits don't make sense as a whole, so we report corruption, don't >> trust any of them, and abort further checking of the tuple. You have be >> burden of proof the other way around. If the xmin appears fine, and xmax >> appears corrupt, then we only know that xmax is corrupt, so the tuple is >> checkable because according to the xmin it committed. > > I agree that it's hard to be sure what's gone once we start finding > corrupted data, but deciding that maybe xmin didn't really commit > because we see that there's something wrong with xmax seems excessive > to me. I thought about a related case: if xmax is a bad multi but is > also hinted invalid, should we try to follow TOAST pointers? I think > that's hard to say, because we don't know whether (1) the invalid > marking is in error, (2) it's wrong to consider it a multi rather than > an XID, (3) the stored multi got overwritten with a garbage value, or > (4) the stored multi got removed before the tuple was frozen. Not > knowing which of those is the case, how are we supposed to decide > whether the TOAST tuples might have been (or be about to get) pruned? > > But, in the case we're talking about here, I don't think it's a > particularly close decision. All we need to say is that if xmax or the > infomask bits pertaining to it are corrupted, we're still going to > suppose that xmin and the infomask bits pertaining to it, which are > all different bytes and bits, are OK. To me, the contrary decision, > namely that a bogus xmax means xmin was probably lying about the > transaction having been committed in the first place, seems like a > serious overreaction. As you say: > >> I don't think how you have it causes undue problems, since deforming the >> tuple when you shouldn't merely risks a bunch of extra not-so-helpful >> corruption messages. And hey, maybe they're helpful to somebody clever >> enough to diagnose why that particular bit of noise was generated. > > I agree. The biggest risk here is that we might omit >0 complaints > when only 0 are justified. That will panic users. The possibility that > we might omit >x complaints when only x are justified, for some x>0, > is also a risk, but it's not nearly as bad, because there's definitely > something wrong, and it's just a question of what it is exactly. So we > have to be really conservative about saying that X is corruption if > there's any possibility that it might be fine. But once we've > complained about one thing, we can take a more balanced approach about > whether to risk issuing more complaints. The possibility that > suppressing the additional complaints might complicate resolution of > the issue also needs to be considered. This all seems fine to me. The main thing is that we don't go on to check the toast, which we don't. >>* If xmin_status happens to be XID_IN_PROGRESS, then in theory >> >> Did you mean to say XID_IS_CURRENT_XID here? > > Yes, I did, thanks. Ouch. You've got a typo: s/XID_IN_CURRENT_XID/XID_IS_CURRENT_XID/ >> /* xmax is an MXID, not an MXID. Sanity check it. */ >> >> Is it an MXID or isn't it? > > Good catch. > > New patch attached. Seems fine other than the typo. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger wrote: > > - If xmax is a multi but seems to be garbled, I changed it to return > > true rather than false. The inserter is known to have committed by > > that point, so I think it's OK to try to deform the tuple. We just > > shouldn't try to check TOAST. > > It is hard to know what to do when at least one tuple header field is > corrupt. You don't necesarily know which one it is. For example, if > HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it > is out of bounds, we report it as corrupt. But was the xmax corrupt? Or was > the HEAP_XMAX_IS_MULTI bit corrupt? It's not clear. I took the view that if > either xmin or xmax appear to be corrupt when interpreted in light of the > various tuple header bits, all we really know is that the set of fields/bits > don't make sense as a whole, so we report corruption, don't trust any of > them, and abort further checking of the tuple. You have be burden of proof > the other way around. If the xmin appears fine, and xmax appears corrupt, > then we only know that xmax is corrupt, so the tuple is checkable because > according to the xmin it committed. I agree that it's hard to be sure what's gone once we start finding corrupted data, but deciding that maybe xmin didn't really commit because we see that there's something wrong with xmax seems excessive to me. I thought about a related case: if xmax is a bad multi but is also hinted invalid, should we try to follow TOAST pointers? I think that's hard to say, because we don't know whether (1) the invalid marking is in error, (2) it's wrong to consider it a multi rather than an XID, (3) the stored multi got overwritten with a garbage value, or (4) the stored multi got removed before the tuple was frozen. Not knowing which of those is the case, how are we supposed to decide whether the TOAST tuples might have been (or be about to get) pruned? But, in the case we're talking about here, I don't think it's a particularly close decision. All we need to say is that if xmax or the infomask bits pertaining to it are corrupted, we're still going to suppose that xmin and the infomask bits pertaining to it, which are all different bytes and bits, are OK. To me, the contrary decision, namely that a bogus xmax means xmin was probably lying about the transaction having been committed in the first place, seems like a serious overreaction. As you say: > I don't think how you have it causes undue problems, since deforming the > tuple when you shouldn't merely risks a bunch of extra not-so-helpful > corruption messages. And hey, maybe they're helpful to somebody clever > enough to diagnose why that particular bit of noise was generated. I agree. The biggest risk here is that we might omit >0 complaints when only 0 are justified. That will panic users. The possibility that we might omit >x complaints when only x are justified, for some x>0, is also a risk, but it's not nearly as bad, because there's definitely something wrong, and it's just a question of what it is exactly. So we have to be really conservative about saying that X is corruption if there's any possibility that it might be fine. But once we've complained about one thing, we can take a more balanced approach about whether to risk issuing more complaints. The possibility that suppressing the additional complaints might complicate resolution of the issue also needs to be considered. > * If xmin_status happens to be XID_IN_PROGRESS, then in theory > > Did you mean to say XID_IS_CURRENT_XID here? Yes, I did, thanks. > /* xmax is an MXID, not an MXID. Sanity check it. */ > > Is it an MXID or isn't it? Good catch. New patch attached. -- Robert Haas EDB: http://www.enterprisedb.com v16-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch Description: Binary data
Re: pg_amcheck contrib application
> On Apr 1, 2021, at 8:08 AM, Robert Haas wrote: > > On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger > wrote: >> These changes got lost between v11 and v12. I've put them back, as well as >> updating to use your language. > > Here's an updated patch that includes your 0001 and 0002 plus a bunch > of changes by me. I intend to commit this version unless somebody > spots a problem with it. > > Here are the things I changed: > > - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I > think it does a better job suggesting that we're uncertain about what > will happen. +1 > - I got rid of bool header_garbled and changed it to bool result, > inverting the sense, because it didn't seem useful to have a function > that ended with if (some_boolean) return false; return true when I > could end the function with return some_other_boolean. +1 > - I removed all the one-word comments that said /* corrupt */ or /* > checkable */ because they seemed redundant. Ok. > - In the xmin_status section of check_tuple_visibility(), I got rid of > the xmin_status == XID_IS_CURRENT_XID and xmin_status == > XID_IN_PROGRESS cases because they were redundant with the xmin_status > != XID_COMMITTED case. Ok. > - If xmax is a multi but seems to be garbled, I changed it to return > true rather than false. The inserter is known to have committed by > that point, so I think it's OK to try to deform the tuple. We just > shouldn't try to check TOAST. It is hard to know what to do when at least one tuple header field is corrupt. You don't necesarily know which one it is. For example, if HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it is out of bounds, we report it as corrupt. But was the xmax corrupt? Or was the HEAP_XMAX_IS_MULTI bit corrupt? It's not clear. I took the view that if either xmin or xmax appear to be corrupt when interpreted in light of the various tuple header bits, all we really know is that the set of fields/bits don't make sense as a whole, so we report corruption, don't trust any of them, and abort further checking of the tuple. You have be burden of proof the other way around. If the xmin appears fine, and xmax appears corrupt, then we only know that xmax is corrupt, so the tuple is checkable because according to the xmin it committed. I don't think how you have it causes undue problems, since deforming the tuple when you shouldn't merely risks a bunch of extra not-so-helpful corruption messages. And hey, maybe they're helpful to somebody clever enough to diagnose why that particular bit of noise was generated. > - I changed both switches over xmax_status to break in each case and > then return true after instead of returning true for each case. I > think this is clearer. Ok. > - I changed get_xid_status() to perform the TransactionIdIs... checks > in the same order that HeapTupleSatisfies...() does them. I believe > that it's incorrect to conclude that the transaction must be in > progress because it neither IsCurrentTransaction nor DidCommit nor > DidAbort, because all of those things will be false for a transaction > that is running at the time of a system crash. The correct rule is > that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit > then it's aborted. Ok. > - I moved a few comments and rewrote some others, including some of > the ones that you took from my earlier draft patch. The thing is, the > comment needs to be adjusted based on where you put it. Like, I had a > comment that says"It should be impossible for xvac to still be > running, since we've removed all that code, but even if it were, it > ought to be safe to read the tuple, since the original inserter must > have committed. But, if the xvac transaction committed, this tuple > (and its associated TOAST tuples) could be pruned at any time." which > in my version was right before a TransactionIdDidCommit() test, and > explains why that test is there and why the code does what it does as > a result. But in your version you've moved it to a place where we've > already tested that the transaction has committed, and more > importantly, where we've already tested that it's not still running. > Saying that it "should" be impossible for it not to be running when > we've *actually checked that* doesn't make nearly as much sense as it > does when we haven't checked that and aren't going to do so. * If xmin_status happens to be XID_IN_PROGRESS, then in theory Did you mean to say XID_IS_CURRENT_XID here? /* xmax is an MXID, not an MXID. Sanity check it. */ Is it an MXID or isn't it? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger wrote: > These changes got lost between v11 and v12. I've put them back, as well as > updating to use your language. Here's an updated patch that includes your 0001 and 0002 plus a bunch of changes by me. I intend to commit this version unless somebody spots a problem with it. Here are the things I changed: - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I think it does a better job suggesting that we're uncertain about what will happen. - I got rid of bool header_garbled and changed it to bool result, inverting the sense, because it didn't seem useful to have a function that ended with if (some_boolean) return false; return true when I could end the function with return some_other_boolean. - I removed all the one-word comments that said /* corrupt */ or /* checkable */ because they seemed redundant. - In the xmin_status section of check_tuple_visibility(), I got rid of the xmin_status == XID_IS_CURRENT_XID and xmin_status == XID_IN_PROGRESS cases because they were redundant with the xmin_status != XID_COMMITTED case. - If xmax is a multi but seems to be garbled, I changed it to return true rather than false. The inserter is known to have committed by that point, so I think it's OK to try to deform the tuple. We just shouldn't try to check TOAST. - I changed both switches over xmax_status to break in each case and then return true after instead of returning true for each case. I think this is clearer. - I changed get_xid_status() to perform the TransactionIdIs... checks in the same order that HeapTupleSatisfies...() does them. I believe that it's incorrect to conclude that the transaction must be in progress because it neither IsCurrentTransaction nor DidCommit nor DidAbort, because all of those things will be false for a transaction that is running at the time of a system crash. The correct rule is that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit then it's aborted. - I moved a few comments and rewrote some others, including some of the ones that you took from my earlier draft patch. The thing is, the comment needs to be adjusted based on where you put it. Like, I had a comment that says"It should be impossible for xvac to still be running, since we've removed all that code, but even if it were, it ought to be safe to read the tuple, since the original inserter must have committed. But, if the xvac transaction committed, this tuple (and its associated TOAST tuples) could be pruned at any time." which in my version was right before a TransactionIdDidCommit() test, and explains why that test is there and why the code does what it does as a result. But in your version you've moved it to a place where we've already tested that the transaction has committed, and more importantly, where we've already tested that it's not still running. Saying that it "should" be impossible for it not to be running when we've *actually checked that* doesn't make nearly as much sense as it does when we haven't checked that and aren't going to do so. -- Robert Haas EDB: http://www.enterprisedb.com v15-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch Description: Binary data
Re: pg_amcheck contrib application
On Wed, Mar 31, 2021 at 1:44 PM Mark Dilger wrote: > I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code > shows that it takes an AccessExclusiveLock. I think the docs are pretty > misleading here, though I understand that grammatically it is hard to say > "accessively exclusively locks" or such. But a part of my analysis was based > on the reasoning that if VF only takes an ExclusiveLock, then there must be > concurrent readers possible. VF went away long enough ago that I had > forgotten exactly how inconvenient it was. It kinda depends on what you mean by concurrent readers, because a transaction that could start on Monday and acquire an XID, and then on Tuesday you could run VACUUM FULL on relation "foo", and then on Wednesday the transaction from before could get around to reading some data from "foo". The two transactions are concurrent, in the sense that the 3-day transaction was running before the VACUUM FULL, was still running after VACUUM FULL, read the same pages that the VACUUM FULL modified, and cares whether the XID from the VACUUM FULL committed or aborted. But, it's not concurrent in the sense that you never have a situation where the VACUUM FULL does some of its modifications, then an overlapping transaction sees them, and then it does the rest of them. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 31, 2021, at 10:31 AM, Mark Dilger > wrote: > > > >> On Mar 31, 2021, at 10:11 AM, Robert Haas wrote: >> >> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger >> wrote: >>> I'm not looking at the old VACUUM FULL code, but my assumption is that if >>> the xvac code were resurrected, then when a tuple is moved off by a VACUUM >>> FULL, the old tuple and associated toast cannot be pruned until concurrent >>> transactions end. So, if amcheck is running more-or-less concurrently with >>> the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM >>> FULL's xid, it can check the toast associated with the moved off tuple >>> after the VACUUM FULL commits. If instead the VACUUM FULL xid was older >>> than amcheck's xmin, then the toast is in danger of being vacuumed away. >>> So the logic in verify_heapam would need to change to think about this >>> distinction. We don't have to concern ourselves about that, because VACUUM >>> FULL cannot be running, and so the xid for it must be older than our xmin, >>> and hence the toast is unconditionally not safe to check. >>> >>> I'm changing the comments back to how you had them, but I'd like to know >>> why my reasoning is wrong. >> >> Let's start by figuring out *whether* your reasoning is wrong. My >> assumption was that old-style VACUUM FULL would move tuples without >> retoasting. That is, if we decided to move a tuple from page 2 of the >> main table to page 1, we would just write the tuple into page 1, >> marking it moved-in, and on page 2 we would mark it moved-off. And >> that we would not examine or follow any TOAST pointers at all, so >> whatever TOAST entries existed would be shared between the two tuples. >> One tuple or the other would eventually die, depending on whether xvac >> went on to commit or abort, but either way the TOAST doesn't need >> updating because there's always exactly 1 remaining tuple using >> pointers to those TOAST values. >> >> Your assumption seems to be the opposite, that the TOASTed values >> would be retoasted as part of VF. If that is true, then your idea is >> right. >> >> Do you agree with this analysis? If so, we can check the code and find >> out which way it actually worked. > > Actually, that makes a lot of sense without even looking at the old code. I > was implicitly assuming that the toast table was undergoing a VF also, and > that the toast pointers in the main table tuples would be updated to point to > the new location, so we'd be unable to follow the pointers to the old > location without danger of the old location entries being vacuumed away. But > if the main table tuples get moved while keeping their toast pointers > unaltered, then you don't have to worry about that, although you do have to > worry that a VF of the main table doesn't help so much with toast table bloat. > > We're only discussing this in order to craft the right comment for a bit of > code with respect to a hypothetical situation in which VF gets resurrected, > so I'm not sure this should be top priority, but I'm curious enough now to go > read the old code Well, that's annoying. The documentation of postgres 8.2 for vacuum full [1] says, Selects "full" vacuum, which may reclaim more space, but takes much longer and exclusively locks the table. I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code shows that it takes an AccessExclusiveLock. I think the docs are pretty misleading here, though I understand that grammatically it is hard to say "accessively exclusively locks" or such. But a part of my analysis was based on the reasoning that if VF only takes an ExclusiveLock, then there must be concurrent readers possible. VF went away long enough ago that I had forgotten exactly how inconvenient it was. [1] https://www.postgresql.org/docs/8.2/sql-vacuum.html — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Wed, Mar 31, 2021 at 1:31 PM Mark Dilger wrote: > Actually, that makes a lot of sense without even looking at the old code. I > was implicitly assuming that the toast table was undergoing a VF also, and > that the toast pointers in the main table tuples would be updated to point to > the new location, so we'd be unable to follow the pointers to the old > location without danger of the old location entries being vacuumed away. But > if the main table tuples get moved while keeping their toast pointers > unaltered, then you don't have to worry about that, although you do have to > worry that a VF of the main table doesn't help so much with toast table bloat. > > We're only discussing this in order to craft the right comment for a bit of > code with respect to a hypothetical situation in which VF gets resurrected, > so I'm not sure this should be top priority, but I'm curious enough now to go > read the old code Right, well, we wouldn't be PostgreSQL hackers if we didn't spend lots of time worrying about obscure details. Whether that's good software engineering or mere pedantry is sometimes debatable. I took a look at commit 0a469c87692d15a22eaa69d4b3a43dd8e278dd64, which removed old-style VACUUM FULL, and AFAICS, it doesn't contain any references to tuple deforming, varlena, HeapTupleHasExternal, or anything else that would make me think it has the foggiest idea whether the tuples it's moving around contain TOAST pointers, so I think I had the right idea. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 31, 2021, at 10:11 AM, Robert Haas wrote: > > On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger > wrote: >> I'm not looking at the old VACUUM FULL code, but my assumption is that if >> the xvac code were resurrected, then when a tuple is moved off by a VACUUM >> FULL, the old tuple and associated toast cannot be pruned until concurrent >> transactions end. So, if amcheck is running more-or-less concurrently with >> the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM >> FULL's xid, it can check the toast associated with the moved off tuple after >> the VACUUM FULL commits. If instead the VACUUM FULL xid was older than >> amcheck's xmin, then the toast is in danger of being vacuumed away. So the >> logic in verify_heapam would need to change to think about this distinction. >> We don't have to concern ourselves about that, because VACUUM FULL cannot >> be running, and so the xid for it must be older than our xmin, and hence the >> toast is unconditionally not safe to check. >> >> I'm changing the comments back to how you had them, but I'd like to know why >> my reasoning is wrong. > > Let's start by figuring out *whether* your reasoning is wrong. My > assumption was that old-style VACUUM FULL would move tuples without > retoasting. That is, if we decided to move a tuple from page 2 of the > main table to page 1, we would just write the tuple into page 1, > marking it moved-in, and on page 2 we would mark it moved-off. And > that we would not examine or follow any TOAST pointers at all, so > whatever TOAST entries existed would be shared between the two tuples. > One tuple or the other would eventually die, depending on whether xvac > went on to commit or abort, but either way the TOAST doesn't need > updating because there's always exactly 1 remaining tuple using > pointers to those TOAST values. > > Your assumption seems to be the opposite, that the TOASTed values > would be retoasted as part of VF. If that is true, then your idea is > right. > > Do you agree with this analysis? If so, we can check the code and find > out which way it actually worked. Actually, that makes a lot of sense without even looking at the old code. I was implicitly assuming that the toast table was undergoing a VF also, and that the toast pointers in the main table tuples would be updated to point to the new location, so we'd be unable to follow the pointers to the old location without danger of the old location entries being vacuumed away. But if the main table tuples get moved while keeping their toast pointers unaltered, then you don't have to worry about that, although you do have to worry that a VF of the main table doesn't help so much with toast table bloat. We're only discussing this in order to craft the right comment for a bit of code with respect to a hypothetical situation in which VF gets resurrected, so I'm not sure this should be top priority, but I'm curious enough now to go read the old code — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger wrote: > I'm not looking at the old VACUUM FULL code, but my assumption is that if the > xvac code were resurrected, then when a tuple is moved off by a VACUUM FULL, > the old tuple and associated toast cannot be pruned until concurrent > transactions end. So, if amcheck is running more-or-less concurrently with > the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM > FULL's xid, it can check the toast associated with the moved off tuple after > the VACUUM FULL commits. If instead the VACUUM FULL xid was older than > amcheck's xmin, then the toast is in danger of being vacuumed away. So the > logic in verify_heapam would need to change to think about this distinction. > We don't have to concern ourselves about that, because VACUUM FULL cannot be > running, and so the xid for it must be older than our xmin, and hence the > toast is unconditionally not safe to check. > > I'm changing the comments back to how you had them, but I'd like to know why > my reasoning is wrong. Let's start by figuring out *whether* your reasoning is wrong. My assumption was that old-style VACUUM FULL would move tuples without retoasting. That is, if we decided to move a tuple from page 2 of the main table to page 1, we would just write the tuple into page 1, marking it moved-in, and on page 2 we would mark it moved-off. And that we would not examine or follow any TOAST pointers at all, so whatever TOAST entries existed would be shared between the two tuples. One tuple or the other would eventually die, depending on whether xvac went on to commit or abort, but either way the TOAST doesn't need updating because there's always exactly 1 remaining tuple using pointers to those TOAST values. Your assumption seems to be the opposite, that the TOASTed values would be retoasted as part of VF. If that is true, then your idea is right. Do you agree with this analysis? If so, we can check the code and find out which way it actually worked. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 30, 2021, at 12:45 PM, Robert Haas wrote: > > On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger > wrote: >> Sure, here are four patches which do the same as the single v12 patch did. > > Thanks. Here are some comments on 0003 and 0004: > > When you posted v11, you said that "Rather than print out all four > toast pointer fields for each toast failure, va_rawsize, va_extsize, > and va_toastrelid are only mentioned in the corruption message if they > are related to the specific corruption. Otherwise, just the > va_valueid is mentioned in the corruption message." I like that > principal; in fact, as you know, I suggested it. But, with the v13 > patches applied, exactly zero of the callers to > report_toast_corruption() appear to be following it, because none of > them include the value ID. I think you need to revise the messages, > e.g. These changes got lost between v11 and v12. I've put them back, as well as updating to use your language. > "toasted value for attribute %u missing from toast table" -> > "toast value %u not found in toast table"; Changed. > "final toast chunk number > %u differs from expected value %u" -> "toast value %u was expected to > end at chunk %u, but ended at chunk %u"; Changed. > "toast chunk sequence number > is null" -> "toast value %u has toast chunk with null sequence > number". Changed. > In the first of those example cases, I think you need not > mention the attribute number because it's already there in its own > column. Correct. I'd removed that but lost that work in v12. > On a related note, it doesn't look like you are actually checking > va_toastrelid here. Doing so seems like it would be a good idea. It > also seems like it would be good to check that the compressed size is > less than or equal to the uncompressed size. Yeah, those checks were in v11 but got lost when I changed things for v12. They are back in v14. > I do not like the name tuple_is_volatile, because volatile has a > couple of meanings already, and this isn't one of them. A > SQL-callable function is volatile if it might return different outputs > given the same inputs, even within the same SQL statement. A C > variable is volatile if it might be magically modified in ways not > known to the compiler. I had suggested tuple_cannot_die_now, which is > closer to the mark. If you want to be even more precise, you could > talk about whether the tuple is potentially prunable (e.g. > tuple_can_be_pruned, which inverts the sense). That's really what > we're worried about: whether MVCC rules would permit the tuple to be > pruned after we release the buffer lock and before we check TOAST. I used "tuple_can_be_pruned". I didn't like "tuple_cannot_die_now", and still don't like that name, as it has several wrong interpretations. One meaning of "cannot die now" is that it has become immortal. Another is "cannot be deleted from the table". > I would ideally prefer not to rename report_corruption(). The old name > is clearer, and changing it produces a bunch of churn that I'd rather > avoid. Perhaps the common helper function could be called > report_corruption_internal(), and the callers could be > report_corruption() and report_toast_corruption(). Yes, hence the commit message in the previous patch set, "This patch can probably be left out if the committer believes it creates more git churn than it is worth." I've removed this patch from this next patch set, and used the function names you suggest. > Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer > to correct now, but I want to go through it in more detail. I think, > though, that you've made some of my comments worse. For example, I > wrote: "It should be impossible for xvac to still be running, since > we've removed all that code, but even if it were, it ought to be safe > to read the tuple, since the original inserter must have committed. > But, if the xvac transaction committed, this tuple (and its associated > TOAST tuples) could be pruned at any time." You changed that to read > "We don't bother comparing against safe_xmin because the VACUUM FULL > must have committed prior to an upgrade and can't still be running." > Your comment is shorter, which is a point in its favor, but what I was > trying to emphasize is that the logic would be correct EVEN IF we > again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your > version makes it sound like the code would need to be revised in that > case. If that's true, then my comment was wrong, but I didn't think it > was true, or I wouldn't have written the comment in that way. I think the logic would have to change if we brought back the old VACUUM FULL behavior. I'm not looking at the old VACUUM FULL code, but my assumption is that if the xvac code were resurrected, then when a tuple is moved off by a VACUUM FULL, the old tuple and associated toast cannot be pruned until concurrent transactions end. So, if amcheck is running more-or-less
Re: pg_amcheck contrib application
On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger wrote: > Sure, here are four patches which do the same as the single v12 patch did. Thanks. Here are some comments on 0003 and 0004: When you posted v11, you said that "Rather than print out all four toast pointer fields for each toast failure, va_rawsize, va_extsize, and va_toastrelid are only mentioned in the corruption message if they are related to the specific corruption. Otherwise, just the va_valueid is mentioned in the corruption message." I like that principal; in fact, as you know, I suggested it. But, with the v13 patches applied, exactly zero of the callers to report_toast_corruption() appear to be following it, because none of them include the value ID. I think you need to revise the messages, e.g. "toasted value for attribute %u missing from toast table" -> "toast value %u not found in toast table"; "final toast chunk number %u differs from expected value %u" -> "toast value %u was expected to end at chunk %u, but ended at chunk %u"; "toast chunk sequence number is null" -> "toast value %u has toast chunk with null sequence number". In the first of those example cases, I think you need not mention the attribute number because it's already there in its own column. On a related note, it doesn't look like you are actually checking va_toastrelid here. Doing so seems like it would be a good idea. It also seems like it would be good to check that the compressed size is less than or equal to the uncompressed size. I do not like the name tuple_is_volatile, because volatile has a couple of meanings already, and this isn't one of them. A SQL-callable function is volatile if it might return different outputs given the same inputs, even within the same SQL statement. A C variable is volatile if it might be magically modified in ways not known to the compiler. I had suggested tuple_cannot_die_now, which is closer to the mark. If you want to be even more precise, you could talk about whether the tuple is potentially prunable (e.g. tuple_can_be_pruned, which inverts the sense). That's really what we're worried about: whether MVCC rules would permit the tuple to be pruned after we release the buffer lock and before we check TOAST. I would ideally prefer not to rename report_corruption(). The old name is clearer, and changing it produces a bunch of churn that I'd rather avoid. Perhaps the common helper function could be called report_corruption_internal(), and the callers could be report_corruption() and report_toast_corruption(). Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer to correct now, but I want to go through it in more detail. I think, though, that you've made some of my comments worse. For example, I wrote: "It should be impossible for xvac to still be running, since we've removed all that code, but even if it were, it ought to be safe to read the tuple, since the original inserter must have committed. But, if the xvac transaction committed, this tuple (and its associated TOAST tuples) could be pruned at any time." You changed that to read "We don't bother comparing against safe_xmin because the VACUUM FULL must have committed prior to an upgrade and can't still be running." Your comment is shorter, which is a point in its favor, but what I was trying to emphasize is that the logic would be correct EVEN IF we again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your version makes it sound like the code would need to be revised in that case. If that's true, then my comment was wrong, but I didn't think it was true, or I wouldn't have written the comment in that way. Also, and maybe this is a job for a separate patch, but then again maybe not, I wonder if it's really a good idea for get_xid_status to return both a XidBoundsViolation and an XidCommitStatus. It seems to me (without checking all that carefully) that it might be better to just flatten all of that into a single enum, because right now it seems like you often end up with two consecutive switch statements where, perhaps, just one would suffice. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 29, 2021, at 1:06 PM, Robert Haas wrote: > > Hmm, so this got ~10x bigger from my version. Could you perhaps > separate it out into a series of patches for easier review? Say, one > that just fixes the visibility logic, and then a second to avoid doing > the TOAST check with a buffer lock held, and then more than that if > there are other pieces that make sense to separate out? Sure, here are four patches which do the same as the single v12 patch did. v13-0001-Refactoring-function-check_tuple_header_and_visi.patch Description: Binary data v13-0002-Replacing-implementation-of-check_tuple_visibili.patch Description: Binary data v13-0003-Renaming-report_corruption-as-report_main_corrup.patch Description: Binary data v13-0004-Checking-toast-separately-from-the-main-table.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Mon, Mar 29, 2021 at 1:45 PM Mark Dilger wrote: > Thanks! The attached patch addresses your comments here and in your prior > email. In particular, this patch changes the tuple visibility logic to not > check tuples for which the inserting transaction aborted or is still in > progress, and to not check toast for tuples deleted in transactions older > than our transaction snapshot's xmin. A list of toasted attributes which are > safe to check is compiled per main table page during the scan of the page, > then checked after the buffer lock on the main page is released. > > In the perhaps unusual case where verify_heapam() is called in a transaction > which has also added tuples to the table being checked, this patch's > visibility logic chooses not to check such tuples. I'm on the fence about > this choice, and am mostly following your lead. I like that this decision > maintains the invariant that we never check tuples which have not yet been > committed. > > The patch includes a bit of refactoring. In the old code, heap_check() > performed clog bounds checking on xmin and xmax prior to calling > check_tuple_header_and_visibilty(), but I think that's not such a great > choice. If the tuple header is garbled to have random bytes in the xmin and > xmax fields, and we can detect that situation because other tuple header > fields are garbled in detectable ways, I'd rather get a report about the > header being garbled than a report about the xmin or xmax being out of > bounds. In the new code, the tuple header is checked first, then the > visibility is checked, then the tuple is checked against the current relation > description, then the tuple attributes are checked. I think the layout is > easier to follow, too. Hmm, so this got ~10x bigger from my version. Could you perhaps separate it out into a series of patches for easier review? Say, one that just fixes the visibility logic, and then a second to avoid doing the TOAST check with a buffer lock held, and then more than that if there are other pieces that make sense to separate out? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 24, 2021, at 1:46 PM, Robert Haas wrote: > > Mark, > > Here's a quick and very dirty sketch of what I think perhaps this > logic could look like. This is pretty much untested and it might be > buggy, but at least you can see whether we're thinking at all in the > same direction. Thanks! The attached patch addresses your comments here and in your prior email. In particular, this patch changes the tuple visibility logic to not check tuples for which the inserting transaction aborted or is still in progress, and to not check toast for tuples deleted in transactions older than our transaction snapshot's xmin. A list of toasted attributes which are safe to check is compiled per main table page during the scan of the page, then checked after the buffer lock on the main page is released. In the perhaps unusual case where verify_heapam() is called in a transaction which has also added tuples to the table being checked, this patch's visibility logic chooses not to check such tuples. I'm on the fence about this choice, and am mostly following your lead. I like that this decision maintains the invariant that we never check tuples which have not yet been committed. The patch includes a bit of refactoring. In the old code, heap_check() performed clog bounds checking on xmin and xmax prior to calling check_tuple_header_and_visibilty(), but I think that's not such a great choice. If the tuple header is garbled to have random bytes in the xmin and xmax fields, and we can detect that situation because other tuple header fields are garbled in detectable ways, I'd rather get a report about the header being garbled than a report about the xmin or xmax being out of bounds. In the new code, the tuple header is checked first, then the visibility is checked, then the tuple is checked against the current relation description, then the tuple attributes are checked. I think the layout is easier to follow, too. v12-0001-Fix-visibility-and-locking-issues.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
Mark, Here's a quick and very dirty sketch of what I think perhaps this logic could look like. This is pretty much untested and it might be buggy, but at least you can see whether we're thinking at all in the same direction. -- Robert Haas EDB: http://www.enterprisedb.com very-rough-visibility-ideas.patch Description: Binary data
Re: pg_amcheck contrib application
On Wed, Mar 24, 2021 at 9:12 AM Robert Haas wrote: > On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger > wrote: > > The visibility rules fix is different in v11, relying on a visibility check > > which more closely follows the implementation of > > HeapTupleSatisfiesVacuumHorizon. > > Hmm. The header comment you wrote says "If a tuple might not be > visible to any running transaction, then we must not check it." But, I > don't find that statement very clear: does it mean "if there could be > even one transaction to which this tuple is not visible, we must not > check it"? Or does it mean "if the number of transactions that can see > this tuple could potentially be zero, then we must not check it"? I > don't think either of those is actually what we care about. I think > what we should be saying is "if the tuple could have been inserted by > a transaction that also added a column to the table, but which > ultimately did not commit, then the table's current TupleDesc might > differ from the one used to construct this tuple, so we must not check > it." Hit send too soon. And I was wrong, too. Wahoo. Thinking about the buildfarm failure, I realized that there's a second danger here, unrelated to the possibility of different TupleDescs, which we talked about before: if the tuple is dead, we can't safely follow any TOAST pointers, because the TOAST chunks might disappear at any time. So technically we could split the return value up into something three-way: if the inserted is known to have committed, we can check the tuple itself, because the TupleDesc has to be reasonable. And, if the tuple is known not to be dead already, and known not to be in a state where it could become dead while we're doing stuff, we can follow the TOAST pointer. I'm not sure whether it's worth trying to be that fancy or not. If we were only concerned about the mismatched-TupleDesc problem, this function could return true in a lot more cases. Once we get to the comment that says "Okay, the inserter committed..." we could just return true. Similarly, the HEAP_MOVED_IN and HEAP_MOVED_OFF cases could just skip all the interior test and return true, because if the tuple is being moved, the original inserter has to have committed. Conversely, however, the !HeapTupleHeaderXminCommitted -> TransactionIdIsCurrentTransactionId case probably ought to always return false. One could argue otherwise: if we're the inserter, then the only in-progress transaction that might have changed the TupleDesc is us, so we could just consider this case to be a true return value also, regardless of what's going on with xmax. In that case, we're not asking "did the inserter definitely commit?" but "are the inserter's possible DDL changes definitely visible to us?" which might be an OK definition too. However, the could-the-TOAST-data-disappear problem is another story. I don't see how we can answer that question correctly with the logic you've got here, because you have no XID threshold. Consider the case where we reach this code: + if (!(tuphdr->t_infomask & HEAP_XMAX_COMMITTED)) + { + if (TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr))) + return true;/* HEAPTUPLE_DELETE_IN_PROGRESS */ + else if (!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuphdr))) + + /* +* Not in Progress, Not Committed, so either Aborted or crashed +*/ + return true;/* HEAPTUPLE_LIVE */ + + /* At this point the xmax is known committed */ + } If we reach the case where the code comment says HEAPTUPLE_DELETE_IN_PROGRESS, we know that the tuple isn't dead right now, and so the TOAST tuples aren't dead either. But, by the time we go try to look at the TOAST tuples, they might have become dead and been pruned away, because the deleting transaction can commit at any time, and after that pruning can happen at any time. Our only guarantee that that won't happen is if the deleting XID is new enough that it's invisible to some snapshot that our backend has registered. That's approximately why HeapTupleSatisfiesVacuumHorizon needs to set *dead_after in this case and one other, and I think you have the same requirement. I just noticed that this whole thing has another, related problem: check_tuple_header_and_visibilty() and check_tuple_attribute() are called from within check_tuple(), which is called while we hold a buffer lock on the heap page. We should not be going and doing complex operations that might take their own buffer locks - like TOAST index checks - while we're holding an lwlock. That's going to have to be changed so that the TOAST pointer checking happens after UnlockReleaseBuffer(); in other words, we'll need to remember the TOAST pointers to go look up and actually look them up after UnlockReleaseBuffer(). But, when we do that, then the HEAPTUPLE_LIVE case above has the
Re: pg_amcheck contrib application
On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger wrote: > The visibility rules fix is different in v11, relying on a visibility check > which more closely follows the implementation of > HeapTupleSatisfiesVacuumHorizon. Hmm. The header comment you wrote says "If a tuple might not be visible to any running transaction, then we must not check it." But, I don't find that statement very clear: does it mean "if there could be even one transaction to which this tuple is not visible, we must not check it"? Or does it mean "if the number of transactions that can see this tuple could potentially be zero, then we must not check it"? I don't think either of those is actually what we care about. I think what we should be saying is "if the tuple could have been inserted by a transaction that also added a column to the table, but which ultimately did not commit, then the table's current TupleDesc might differ from the one used to construct this tuple, so we must not check it." -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 17, 2021, at 9:00 PM, Mark Dilger wrote: > > Of the toast pointer fields: > >int32 va_rawsize; /* Original data size (includes header) */ >int32 va_extsize; /* External saved size (doesn't) */ >Oid va_valueid; /* Unique ID of value within TOAST table */ >Oid va_toastrelid; /* RelID of TOAST table containing it */ > > all seem worth getting as part of any toast error message, even if these > fields themselves are not corrupt. It just makes it easier to understand the > context of the error you're looking at. At first I tried putting these into > each message, but it is very wordy to say things like "toast pointer with > rawsize %u and extsize %u pointing at relation with oid %u" and such. It > made more sense to just add these four fields to the verify_heapam tuple > format. That saves putting them in the message text itself, and has the > benefit that you could filter the rows coming from verify_heapam() for ones > where valueid is or is not null, for example. This changes the external > interface of verify_heapam, but I didn't bother with a amcheck--1.3--1.4.sql > because amcheck--1.2--1.3. sql was added as part of the v14 development work > and has not yet been released. My assumption is that I can just change it, > rather than making a new upgrade file. > > These patches fix the visibility rules and add extra toast checking. These new patches address the same issues as v9 (which was never committed), and v10 (which was never even posted to this list), with some changes. Rather than print out all four toast pointer fields for each toast failure, va_rawsize, va_extsize, and va_toastrelid are only mentioned in the corruption message if they are related to the specific corruption. Otherwise, just the va_valueid is mentioned in the corruption message. The visibility rules fix is different in v11, relying on a visibility check which more closely follows the implementation of HeapTupleSatisfiesVacuumHorizon. v11-0001-Fixing-amcheck-tuple-visibility-rules.patch Description: Binary data v11-0002-pg_amcheck-extend-toast-corruption-reports.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Tue, Mar 23, 2021 at 12:53 PM Peter Geoghegan wrote: > One of the advantages of this design is that we verify practically all > of the work involved in deleting an entire subtree up-front, inside > _bt_lock_subtree_parent(). It's clearly safe to back out of it if it > looks dicey. That's taken care of. I just pushed a commit that teaches _bt_lock_subtree_parent() to press on. -- Peter Geoghegan
Re: pg_amcheck contrib application
On Tue, Mar 23, 2021 at 12:44 PM Tom Lane wrote: > > I will make this change to HEAD soon, barring objections. > > +1. Not deleting the upper page seems better than the alternatives. FWIW it might also work that way as a holdover from the old page deletion algorithm. These days we decide exactly which pages (leaf page plus possible internal pages) are to be deleted as a whole up front (these are a subtree, though usually just a degenerate single-leaf-page subtree -- internal page deletions are rare). One of the advantages of this design is that we verify practically all of the work involved in deleting an entire subtree up-front, inside _bt_lock_subtree_parent(). It's clearly safe to back out of it if it looks dicey. -- Peter Geoghegan
Re: pg_amcheck contrib application
Peter Geoghegan writes: > That being said, I should make _bt_lock_subtree_parent() return false > and back out of page deletion without raising an error in the case > where we really cannot locate a valid downlink. We really ought to > soldier on when that happens, since we'll do that for a bunch of other > reasons already. I believe that the only reason we throw an error > today is for parity with the page split case (the main > _bt_getstackbuf() call). But this isn't the same situation at all -- > this is VACUUM. > I will make this change to HEAD soon, barring objections. +1. Not deleting the upper page seems better than the alternatives. regards, tom lane
Re: pg_amcheck contrib application
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas wrote: > Right, good point. But... does that really apply to > 005_opclass_damage.pl? I feel like with the kind of physical damage > we're doing in 003_check.pl, it makes a lot of sense to stop vacuum > from crashing headlong into that table. But, 005 is doing "logical" > damage rather than "physical" damage, and I don't see why autovacuum > should misbehave in that kind of case. In fact, the fact that > autovacuum can handle such cases is one of the selling points for the > whole design of vacuum, as opposed to, for example, retail index > lookups. FWIW that is only 99.9% true (contrary to what README.HOT says). This is the case because nbtree page deletion will in fact search the tree to find a downlink to the target page, which must be removed at the same time -- see the call to _bt_search() made within nbtpage.c. This is much less of a problem than you'd think, though, even with an opclass that gives wrong answers all the time. Because it's also true that _bt_getstackbuf() is remarkably tolerant when it actually goes to locate the downlink -- because that happens via a linear search that matches on downlink block number (it doesn't use the opclass for that part). This means that we'll accidentally fail to fail if the page is *somewhere* to the right in the "true" key space. Which probably means that it has a greater than 50% chance of not failing with a 100% broken opclass. Which probably makes our odds better with more plausible levels of misbehavior (e.g. collation changes). That being said, I should make _bt_lock_subtree_parent() return false and back out of page deletion without raising an error in the case where we really cannot locate a valid downlink. We really ought to soldier on when that happens, since we'll do that for a bunch of other reasons already. I believe that the only reason we throw an error today is for parity with the page split case (the main _bt_getstackbuf() call). But this isn't the same situation at all -- this is VACUUM. I will make this change to HEAD soon, barring objections. -- Peter Geoghegan
Re: pg_amcheck contrib application
> On Mar 23, 2021, at 12:05 PM, Robert Haas wrote: > > 005 is doing "logical" > damage rather than "physical" damage, and I don't see why autovacuum > should misbehave in that kind of case. In fact, the fact that > autovacuum can handle such cases is one of the selling points for the > whole design of vacuum, as opposed to, for example, retail index > lookups. That is a good point. Checking that autovacuum behaves sensibly despite sort order breakage sounds reasonable, but test 005 doesn't do that reliably, because it does nothing to make sure that autovacuum runs against the affected table during the short window when the affected table exists. All the same, I don't see that turning autovacuum off is required. If autovacuum is broken in this regard, we may get occasional, hard to reproduce build farm failures, but that would be more informative than no failures at all. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Thu, Mar 18, 2021 at 12:12 AM Tom Lane wrote: > Mark Dilger writes: > >> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: > >> Since we now know that shutting autovacuum off makes the problem go > >> away, I don't see a reason to commit 0001. We should fix pg_amcheck > >> instead, if, as presently seems to be the case, that's where the > >> problem is. > > > If you get unlucky, autovacuum will process one of the tables that the test > > intentionally corrupted, with bad consequences, ultimately causing build > > farm intermittent test failures. > > Um, yeah, the test had better shut off autovacuum on any table that > it intentionally corrupts. Right, good point. But... does that really apply to 005_opclass_damage.pl? I feel like with the kind of physical damage we're doing in 003_check.pl, it makes a lot of sense to stop vacuum from crashing headlong into that table. But, 005 is doing "logical" damage rather than "physical" damage, and I don't see why autovacuum should misbehave in that kind of case. In fact, the fact that autovacuum can handle such cases is one of the selling points for the whole design of vacuum, as opposed to, for example, retail index lookups. Pending resolution of that question, I've committed the change to disable autovacuum in 003, and also Mark's changes to have it also run pg_amcheck BEFORE corrupting anything, so the post-corruption tests fail - say by finding the wrong kind of corruption - we can see whether it was also failing before the corruption was even introduced. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
Mark Dilger writes: >> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: >> Since we now know that shutting autovacuum off makes the problem go >> away, I don't see a reason to commit 0001. We should fix pg_amcheck >> instead, if, as presently seems to be the case, that's where the >> problem is. > If you get unlucky, autovacuum will process one of the tables that the test > intentionally corrupted, with bad consequences, ultimately causing build farm > intermittent test failures. Um, yeah, the test had better shut off autovacuum on any table that it intentionally corrupts. regards, tom lane
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 12:52 PM, Robert Haas wrote: > > On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger > wrote: >> It is unfortunate that the failing test only runs pg_amcheck after creating >> numerous corruptions, as we can't know if pg_amcheck would have complained >> about pg_statistic before the corruptions were created in other tables, or >> if it only does so after. The attached patch v7-0003 adds a call to >> pg_amcheck after all tables are created and populated, but before any >> corruptions are caused. This should help narrow down what is happening, and >> doesn't hurt to leave in place long-term. >> >> I don't immediately see anything wrong with how pg_statistic uses a >> pseudo-type, but it leads me to want to poke a bit more at pg_statistic on >> hornet and tern, though I don't have any regression tests specifically for >> doing so. >> >> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously. > > Since we now know that shutting autovacuum off makes the problem go > away, I don't see a reason to commit 0001. We should fix pg_amcheck > instead, if, as presently seems to be the case, that's where the > problem is. If you get unlucky, autovacuum will process one of the tables that the test intentionally corrupted, with bad consequences, ultimately causing build farm intermittent test failures. We could wait to see if this ever happens before fixing it, if you prefer. I'm not sure what that buys us, though. The right approach, I think, is to extend the contrib/amcheck tests to include regressing this particular case to see if it fails. I've written that test and verified that it fails against the old code and passes against the new. > I just committed 0002. Thanks! > I think 0003 is probably a good idea, but I haven't committed it yet. It won't do anything for us in this particular case, but it might make debugging failures quicker in the future. > As for 0004, it seems to me that we might want to do a little more > rewording of these messages and perhaps we should try to do it all at > once. Like, for example, your other change to print out the toast > value ID seems like a good idea, and could apply to any new messages > as well as some existing ones. Maybe there are also more fields in the > TOAST pointer for which we could add checks. Of the toast pointer fields: int32 va_rawsize; /* Original data size (includes header) */ int32 va_extsize; /* External saved size (doesn't) */ Oid va_valueid; /* Unique ID of value within TOAST table */ Oid va_toastrelid; /* RelID of TOAST table containing it */ all seem worth getting as part of any toast error message, even if these fields themselves are not corrupt. It just makes it easier to understand the context of the error you're looking at. At first I tried putting these into each message, but it is very wordy to say things like "toast pointer with rawsize %u and extsize %u pointing at relation with oid %u" and such. It made more sense to just add these four fields to the verify_heapam tuple format. That saves putting them in the message text itself, and has the benefit that you could filter the rows coming from verify_heapam() for ones where valueid is or is not null, for example. This changes the external interface of verify_heapam, but I didn't bother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was added as part of the v14 development work and has not yet been released. My assumption is that I can just change it, rather than making a new upgrade file. These patches fix the visibility rules and add extra toast checking. Some of the previous patch material is not included, since it is not clear to me if you wanted to commit any of it. v9-0001-Fixing-amcheck-tuple-visibility-rules.patch Description: Binary data v9-0002-pg_amcheck-provide-additional-toast-corruption-in.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger wrote: > It is unfortunate that the failing test only runs pg_amcheck after creating > numerous corruptions, as we can't know if pg_amcheck would have complained > about pg_statistic before the corruptions were created in other tables, or if > it only does so after. The attached patch v7-0003 adds a call to pg_amcheck > after all tables are created and populated, but before any corruptions are > caused. This should help narrow down what is happening, and doesn't hurt to > leave in place long-term. > > I don't immediately see anything wrong with how pg_statistic uses a > pseudo-type, but it leads me to want to poke a bit more at pg_statistic on > hornet and tern, though I don't have any regression tests specifically for > doing so. > > Tests v7-0001 and v7-0002 are just repeats of the tests posted previously. Since we now know that shutting autovacuum off makes the problem go away, I don't see a reason to commit 0001. We should fix pg_amcheck instead, if, as presently seems to be the case, that's where the problem is. I just committed 0002. I think 0003 is probably a good idea, but I haven't committed it yet. As for 0004, it seems to me that we might want to do a little more rewording of these messages and perhaps we should try to do it all at once. Like, for example, your other change to print out the toast value ID seems like a good idea, and could apply to any new messages as well as some existing ones. Maybe there are also more fields in the TOAST pointer for which we could add checks. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 11:40 AM, Robert Haas wrote: > > On Tue, Mar 16, 2021 at 2:22 PM Tom Lane wrote: >> I'm circling back around to the idea that amcheck is trying to >> validate TOAST references that are already dead, and it's getting >> burnt because something-or-other has already removed the toast >> rows, though not the referencing datums. That's legal behavior >> once the rows are marked dead. Upthread it was claimed that >> amcheck isn't doing that, but this looks like a smoking gun to me. > > I think this theory has some legs. From check_tuple_header_and_visibilty(): > >else if (!(infomask & HEAP_XMAX_COMMITTED)) >return true;/* > HEAPTUPLE_DELETE_IN_PROGRESS or > * > HEAPTUPLE_LIVE */ >else >return false; /* > HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */ >} >return true;/* not dead */ > } > > That first case looks wrong to me. Don't we need to call > get_xid_status() here, Mark? As coded, it seems that if the xmin is ok > and the xmax is marked committed, we consider the tuple checkable. The > comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or > HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either > HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the > xmax is all-visible. And in the second case the comment says it's > either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that > case it's either HEAPTUPLE_DELETE_IN_PROGRESS or > HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID > status. > > Another thought here is that it's probably not wicked smart to be > relying on the hint bits to match the actual status of the tuple in > cases of corruption. Maybe we should be warning about tuples that are > have xmin or xmax flagged as committed or invalid when in fact clog > disagrees. That's not a particularly uncommon case, and it's hard to > check. This code was not committed as part of the recent pg_amcheck work, but longer ago, and I'm having trouble reconstructing exactly why it was written that way. Changing check_tuple_header_and_visibilty() fixes the regression test and also manual tests against the "regression" database that I've been using. I'd like to ponder the changes a while longer before I post, but the fact that these changes fix the tests seems to add credibility to this theory. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On 3/13/21 1:30 AM, Andres Freund wrote: > Hi, > > On 2021-03-13 01:22:54 -0500, Tom Lane wrote: >> Mark Dilger writes: >>> On Mar 12, 2021, at 10:16 PM, Noah Misch wrote: hoverfly does configure with PERL=perl64. /usr/bin/prove is from the 32-bit Perl, so I suspect the TAP suites get 32-bit Perl that way. (There's no "prove64".) >> Oh, that's annoying. > I suspect we could solve that by invoking changing our /usr/bin/prove > invocation to instead be PERL /usr/bin/prove? That might be a good thing > independent of this issue, because it's not unreasonable for a user to > expect that we'd actually use the perl installation they configured... > > Although I do not know how prove determines the perl installation it's > going to use for the test scripts... > There's a very good chance this would break msys builds, which are configured to build against a pure native (i.e. non-msys) perl sucj as AS or Strawberry, but need to run msys perl for TAP tests, so it gets the paths right. (Don't get me started on the madness that can result from managing this. I've lost weeks of my life to it ... If you add cygwin into the mix and you're trying to coordinate builds among three buildfarm animals it's a major pain.) cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: pg_amcheck contrib application
On Tue, Mar 16, 2021 at 2:45 PM Tom Lane wrote: > > In what context is it OK to just add extra alignment padding? > > It's *not* extra, according to pg_statistic's tuple descriptor. > Both forming and deforming of pg_statistic tuples should honor > that and locate stavaluesX values on d-aligned boundaries. > > It could be that a particular entry is of an array type that > only requires i-alignment. But that doesn't break anything, > it just means we inserted more padding than an omniscient > implementation would do. OK, yeah, I just misunderstood what you were saying. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
Mark Dilger writes: > CopyStatistics seems to just copy Form_pg_statistic without regard for > who owns the toast. Is this safe? No less so than a ton of other places that insert values that might've come from on-disk storage. heap_toast_insert_or_update() is responsible for dealing with the problem. These days it looks like it's actually toast_tuple_init() that takes care of dereferencing previously-toasted values during an INSERT. regards, tom lane
Re: pg_amcheck contrib application
Robert Haas writes: > On Tue, Mar 16, 2021 at 1:48 PM Tom Lane wrote: >> No. What should be happening there is that some arrays in the column >> get larger alignment than they actually need, but that shouldn't cause >> a problem (and has not done so, AFAIK, in the decades that it's been >> like this). As you say, deforming the tuple is going to rely on the >> table's tupdesc for alignment; it can't know what is in the data. > OK, I don't understand this. attalign is 'd', which is already the > maximum possible, and even if it weren't, individual rows can't decide > to use a larger OR smaller alignment than expected without breaking > stuff. > In what context is it OK to just add extra alignment padding? It's *not* extra, according to pg_statistic's tuple descriptor. Both forming and deforming of pg_statistic tuples should honor that and locate stavaluesX values on d-aligned boundaries. It could be that a particular entry is of an array type that only requires i-alignment. But that doesn't break anything, it just means we inserted more padding than an omniscient implementation would do. (I suppose in some parallel universe there could be a machine where i-alignment is stricter than d-alignment, and then we'd have trouble.) regards, tom lane
Re: pg_amcheck contrib application
On Tue, Mar 16, 2021 at 2:22 PM Tom Lane wrote: > I'm circling back around to the idea that amcheck is trying to > validate TOAST references that are already dead, and it's getting > burnt because something-or-other has already removed the toast > rows, though not the referencing datums. That's legal behavior > once the rows are marked dead. Upthread it was claimed that > amcheck isn't doing that, but this looks like a smoking gun to me. I think this theory has some legs. From check_tuple_header_and_visibilty(): else if (!(infomask & HEAP_XMAX_COMMITTED)) return true;/* HEAPTUPLE_DELETE_IN_PROGRESS or * HEAPTUPLE_LIVE */ else return false; /* HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */ } return true;/* not dead */ } That first case looks wrong to me. Don't we need to call get_xid_status() here, Mark? As coded, it seems that if the xmin is ok and the xmax is marked committed, we consider the tuple checkable. The comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the xmax is all-visible. And in the second case the comment says it's either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that case it's either HEAPTUPLE_DELETE_IN_PROGRESS or HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID status. Another thought here is that it's probably not wicked smart to be relying on the hint bits to match the actual status of the tuple in cases of corruption. Maybe we should be warning about tuples that are have xmin or xmax flagged as committed or invalid when in fact clog disagrees. That's not a particularly uncommon case, and it's hard to check. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: > > I'm not entirely sure what's going on, but I think coming at this > with the mindset that "amcheck has detected some corruption" is > just going to lead you astray. Almost certainly, it's "amcheck > is incorrectly claiming corruption". That might be from mis-decoding > a TOAST-referencing datum. (Too bad the message doesn't report the > TOAST OID it probed for, so we can see if that's sane or not.) CopyStatistics seems to just copy Form_pg_statistic without regard for who owns the toast. Is this safe? Looking at RemoveStatistics, I'm not sure that it is. Anybody more familiar with this code care to give an opinion? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Tue, Mar 16, 2021 at 1:48 PM Tom Lane wrote: > No. What should be happening there is that some arrays in the column > get larger alignment than they actually need, but that shouldn't cause > a problem (and has not done so, AFAIK, in the decades that it's been > like this). As you say, deforming the tuple is going to rely on the > table's tupdesc for alignment; it can't know what is in the data. OK, I don't understand this. attalign is 'd', which is already the maximum possible, and even if it weren't, individual rows can't decide to use a larger OR smaller alignment than expected without breaking stuff. In what context is it OK to just add extra alignment padding? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
Mark Dilger writes: > On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: >> (Too bad the message doesn't report the >> TOAST OID it probed for, so we can see if that's sane or not.) > I've added that and now get the toast pointer's va_valueid in the message: > heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 2, > attribute 29: > toasted value id 13227 for attribute 29 missing from toast table > heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 5, > attribute 27: > toasted value id 13228 for attribute 27 missing from toast table That's awfully interesting, because OIDs less than 16384 should only be generated during initdb. So what we seem to be looking at here is pg_statistic entries that were made during the ANALYZE done by initdb (cf. vacuum_db()), and then were deleted during a later auto-analyze (in the buildfarm) or deliberate analyze (in your reproducer). But if they're deleted, why is amcheck looking for them? I'm circling back around to the idea that amcheck is trying to validate TOAST references that are already dead, and it's getting burnt because something-or-other has already removed the toast rows, though not the referencing datums. That's legal behavior once the rows are marked dead. Upthread it was claimed that amcheck isn't doing that, but this looks like a smoking gun to me. regards, tom lane
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 10:48 AM, Tom Lane wrote: > > Robert Haas writes: >> On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger >> wrote: >>> It shows them all has having attalign = 'd', but for some array types the >>> alignment will be 'i', not 'd'. So it's lying a bit about the contents. >>> But I'm now confused why this caused problems on the two hosts where >>> integer and double have the same alignment? It seems like that would be >>> the one place where the bug would not happen, not the one place where it >>> does. > >> Wait, so the value in the attalign column isn't the alignment we're >> actually using? I can understand how we might generate tuples like >> that, if we pass the actual type to construct_array(), but shouldn't >> we then get garbage when we deform the tuple? > > No. What should be happening there is that some arrays in the column > get larger alignment than they actually need, but that shouldn't cause > a problem (and has not done so, AFAIK, in the decades that it's been > like this). As you say, deforming the tuple is going to rely on the > table's tupdesc for alignment; it can't know what is in the data. > > I'm not entirely sure what's going on, but I think coming at this > with the mindset that "amcheck has detected some corruption" is > just going to lead you astray. Almost certainly, it's "amcheck > is incorrectly claiming corruption". That might be from mis-decoding > a TOAST-referencing datum. (Too bad the message doesn't report the > TOAST OID it probed for, so we can see if that's sane or not.) I've added that and now get the toast pointer's va_valueid in the message: mark.dilger@laptop280-ma-us amcheck % pg_amcheck -t "pg_catalog.pg_statistic" postgres heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 2, attribute 29: toasted value id 13227 for attribute 29 missing from toast table heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 5, attribute 27: toasted value id 13228 for attribute 27 missing from toast table diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c index 5ccae46375..a0be71bb7f 100644 --- a/contrib/amcheck/verify_heapam.c +++ b/contrib/amcheck/verify_heapam.c @@ -,8 +,8 @@ check_tuple_attribute(HeapCheckContext *ctx) } if (!found_toasttup) report_corruption(ctx, - psprintf("toasted value for attribute %u missing from toast table", - ctx->attnum)); + psprintf("toasted value id %u for attribute %u missing from toast table", + toast_pointer.va_valueid, ctx->attnum)); else if (ctx->chunkno != (ctx->endchunk + 1)) report_corruption(ctx, psprintf("final toast chunk number %u differs from expected value %u", — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
... btw, I now see that tern and hornet are passing this test at least as much as they're failing, proving that there's some timing or random chance involved. That doesn't completely eliminate the idea that there may be an architecture component to the issue, but it sure reduces its credibility. I now believe the theory that the triggering condition is an auto-analyze happening at the right time, and populating pg_statistic with some data that other runs don't see. regards, tom lane
Re: pg_amcheck contrib application
Robert Haas writes: > On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger > wrote: >> It shows them all has having attalign = 'd', but for some array types the >> alignment will be 'i', not 'd'. So it's lying a bit about the contents. >> But I'm now confused why this caused problems on the two hosts where integer >> and double have the same alignment? It seems like that would be the one >> place where the bug would not happen, not the one place where it does. > Wait, so the value in the attalign column isn't the alignment we're > actually using? I can understand how we might generate tuples like > that, if we pass the actual type to construct_array(), but shouldn't > we then get garbage when we deform the tuple? No. What should be happening there is that some arrays in the column get larger alignment than they actually need, but that shouldn't cause a problem (and has not done so, AFAIK, in the decades that it's been like this). As you say, deforming the tuple is going to rely on the table's tupdesc for alignment; it can't know what is in the data. I'm not entirely sure what's going on, but I think coming at this with the mindset that "amcheck has detected some corruption" is just going to lead you astray. Almost certainly, it's "amcheck is incorrectly claiming corruption". That might be from mis-decoding a TOAST-referencing datum. (Too bad the message doesn't report the TOAST OID it probed for, so we can see if that's sane or not.) regards, tom lane
Re: pg_amcheck contrib application
On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger wrote: > Yeah, that looks related: > > regression=# select attname, attlen, attnum, attalign from pg_attribute where > attrelid = 2619 and attname like 'stavalue%'; > attname | attlen | attnum | attalign > +++-- > stavalues1 | -1 | 27 | d > stavalues2 | -1 | 28 | d > stavalues3 | -1 | 29 | d > stavalues4 | -1 | 30 | d > stavalues5 | -1 | 31 | d > (5 rows) > > It shows them all has having attalign = 'd', but for some array types the > alignment will be 'i', not 'd'. So it's lying a bit about the contents. But > I'm now confused why this caused problems on the two hosts where integer and > double have the same alignment? It seems like that would be the one place > where the bug would not happen, not the one place where it does. Wait, so the value in the attalign column isn't the alignment we're actually using? I can understand how we might generate tuples like that, if we pass the actual type to construct_array(), but shouldn't we then get garbage when we deform the tuple? I mean, heap_deform_tuple() is going to get the alignment from the tuple descriptor, which is a table property. It doesn't (and can't) know what type of value is stored inside any of these fields for real, right? In other words, isn't this actually corruption, caused by a bug in our code, and how have we not noticed it before now? -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 9:30 AM, Mark Dilger wrote: > > > >> On Mar 16, 2021, at 9:07 AM, Tom Lane wrote: >> >> Mark Dilger writes: >>> I think autovacuum simply triggers the bug, and is not the cause of the >>> bug. If I turn autovacuum off and instead do an ANALYZE in each test >>> database rather than performing the corruptions, I get reports about >>> problems in pg_statistic. This is on my mac laptop. This rules out the >>> theory that autovacuum is propogating corruptions into pg_statistic, and >>> also the theory that it is architecture dependent. >> >> I wonder whether amcheck is confused by the declaration of those columns >> as "anyarray". > > It uses attlen and attalign for the attribute, so that idea does make sense. > It gets that via TupleDescAttr(RelationGetDescr(rel), attnum). Yeah, that looks related: regression=# select attname, attlen, attnum, attalign from pg_attribute where attrelid = 2619 and attname like 'stavalue%'; attname | attlen | attnum | attalign +++-- stavalues1 | -1 | 27 | d stavalues2 | -1 | 28 | d stavalues3 | -1 | 29 | d stavalues4 | -1 | 30 | d stavalues5 | -1 | 31 | d (5 rows) It shows them all has having attalign = 'd', but for some array types the alignment will be 'i', not 'd'. So it's lying a bit about the contents. But I'm now confused why this caused problems on the two hosts where integer and double have the same alignment? It seems like that would be the one place where the bug would not happen, not the one place where it does. I'm attaching a test that reliably reproduces this for me: v8-0005-Adding-pg_amcheck-special-test-for-pg_statistic.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Mar 16, 2021, at 9:07 AM, Tom Lane wrote: > > Mark Dilger writes: >> I think autovacuum simply triggers the bug, and is not the cause of the bug. >> If I turn autovacuum off and instead do an ANALYZE in each test database >> rather than performing the corruptions, I get reports about problems in >> pg_statistic. This is on my mac laptop. This rules out the theory that >> autovacuum is propogating corruptions into pg_statistic, and also the theory >> that it is architecture dependent. > > I wonder whether amcheck is confused by the declaration of those columns > as "anyarray". It uses attlen and attalign for the attribute, so that idea does make sense. It gets that via TupleDescAttr(RelationGetDescr(rel), attnum). — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
Mark Dilger writes: > I think autovacuum simply triggers the bug, and is not the cause of the bug. > If I turn autovacuum off and instead do an ANALYZE in each test database > rather than performing the corruptions, I get reports about problems in > pg_statistic. This is on my mac laptop. This rules out the theory that > autovacuum is propogating corruptions into pg_statistic, and also the theory > that it is architecture dependent. I wonder whether amcheck is confused by the declaration of those columns as "anyarray". regards, tom lane
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 11:09 PM, Noah Misch wrote: > >> Not sure that I believe the theory that this is from bad luck of >> concurrent autovacuum timing, though. > > With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve > runs. With v6-0001-Turning-off-autovacuum-during-corruption-tests.patch > applied, 196 runs all succeeded. > >> The fact that we're seeing >> this on just those two animals suggests strongly to me that it's >> architecture-correlated, instead. > > That is possible. I think autovacuum simply triggers the bug, and is not the cause of the bug. If I turn autovacuum off and instead do an ANALYZE in each test database rather than performing the corruptions, I get reports about problems in pg_statistic. This is on my mac laptop. This rules out the theory that autovacuum is propogating corruptions into pg_statistic, and also the theory that it is architecture dependent. I'll investigate further. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Mon, Mar 15, 2021 at 02:57:20PM -0400, Tom Lane wrote: > Mark Dilger writes: > > On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: > >> These animals have somewhat weird alignment properties: MAXALIGN is 8 > >> but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their > >> choices about whether an out-of-line TOAST value is needed, breaking > >> this test case. That machine also has awful performance for filesystem metadata operations, like open(O_CREAT). Its CPU and read()/write() performance are normal. > > The logic in verify_heapam only looks for a value in the toast table if > > the tuple it gets from the main table (in this case, from pg_statistic) > > has an attribute that claims to be toasted. The error message we're > > seeing that you quoted above simply means that no entry exists in the > > toast table. > > Yeah, that could be phrased better. Do we have a strong enough lock on > the table under examination to be sure that autovacuum couldn't remove > a dead toast entry before we reach it? But this would only be an > issue if we are trying to check validity of toasted fields within > known-dead tuples, which I would argue we shouldn't, since lock or > no lock there's no guarantee the toast entry is still there. > > Not sure that I believe the theory that this is from bad luck of > concurrent autovacuum timing, though. With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve runs. With v6-0001-Turning-off-autovacuum-during-corruption-tests.patch applied, 196 runs all succeeded. > The fact that we're seeing > this on just those two animals suggests strongly to me that it's > architecture-correlated, instead. That is possible.
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Not sure that I believe the theory that this is from bad luck of > concurrent autovacuum timing, though. The fact that we're seeing > this on just those two animals suggests strongly to me that it's > architecture-correlated, instead. I find it a little hard to see how amcheck is tripping over a toasted value just in this one table, pg_statistic, and not in any of the others. The error message says the problem is in attribute 27, which I believe means it is stavalues2. The comment in the header for this catalog is intriguing: /* * Values in these arrays are values of the column's data type, or of some * related type such as an array element type. We presently have to cheat * quite a bit to allow polymorphic arrays of this kind, but perhaps * someday it'll be a less bogus facility. */ anyarraystavalues1; anyarraystavalues2; anyarraystavalues3; anyarraystavalues4; anyarraystavalues5; This is hard to duplicate in a test, because you can't normally create tables with pseudo-type columns. However, if amcheck is walking the tuple and does not correctly update the offset with the length of attribute 26, it may try to read attribute 27 at the wrong offset, unsurprisingly leading to garbage, perhaps a garbage toast pointer. The attached patch v7-0004 adds a check to verify_heapam to see if the va_toastrelid matches the expected toast table oid for the table we're reading. That check almost certainly should have been included in the initial version of verify_heapam, so even if it does nothing to help us in this issue, it's good that it be committed, I think. It is unfortunate that the failing test only runs pg_amcheck after creating numerous corruptions, as we can't know if pg_amcheck would have complained about pg_statistic before the corruptions were created in other tables, or if it only does so after. The attached patch v7-0003 adds a call to pg_amcheck after all tables are created and populated, but before any corruptions are caused. This should help narrow down what is happening, and doesn't hurt to leave in place long-term. I don't immediately see anything wrong with how pg_statistic uses a pseudo-type, but it leads me to want to poke a bit more at pg_statistic on hornet and tern, though I don't have any regression tests specifically for doing so. Tests v7-0001 and v7-0002 are just repeats of the tests posted previously. v7-0001-Turning-off-autovacuum-during-corruption-tests.patch Description: Binary data v7-0002-Fixing-a-confusing-amcheck-corruption-message.patch Description: Binary data v7-0003-Extend-pg_amcheck-test-suite.patch Description: Binary data v7-0004-Add-extra-check-of-toast-pointer-in-amcheck.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Yeah, that could be phrased better. Attaching the 0001 patch submitted earlier, plus 0002 which fixes the confusing corruption message. v6-0001-Turning-off-autovacuum-during-corruption-tests.patch Description: Binary data v6-0002-Fixing-a-confusing-amcheck-corruption-message.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 11:57 AM, Tom Lane wrote: > > Do we have a strong enough lock on > the table under examination to be sure that autovacuum couldn't remove > a dead toast entry before we reach it? The main table and the toast table are only locked with AccessShareLock. Each page in the main table is locked with BUFFER_LOCK_SHARE. Toast is not checked unless the tuple passes visibility checks verifying the tuple is not dead. > But this would only be an > issue if we are trying to check validity of toasted fields within > known-dead tuples, which I would argue we shouldn't, since lock or > no lock there's no guarantee the toast entry is still there. It does not intentionally check toasted fields within dead tuples. If that is happening, it's a bug, possibly in the visibility function. But I'm not seeing a specific reason to assume that is the issue. If we still see the complaint on tern or hornet after committing the patch to turn off autovacuum, we will be able to rule out the theory that the toast was removed by autovacuum. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
Mark Dilger writes: > On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: >> These animals have somewhat weird alignment properties: MAXALIGN is 8 >> but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their >> choices about whether an out-of-line TOAST value is needed, breaking >> this test case. > The logic in verify_heapam only looks for a value in the toast table if > the tuple it gets from the main table (in this case, from pg_statistic) > has an attribute that claims to be toasted. The error message we're > seeing that you quoted above simply means that no entry exists in the > toast table. Yeah, that could be phrased better. Do we have a strong enough lock on the table under examination to be sure that autovacuum couldn't remove a dead toast entry before we reach it? But this would only be an issue if we are trying to check validity of toasted fields within known-dead tuples, which I would argue we shouldn't, since lock or no lock there's no guarantee the toast entry is still there. Not sure that I believe the theory that this is from bad luck of concurrent autovacuum timing, though. The fact that we're seeing this on just those two animals suggests strongly to me that it's architecture-correlated, instead. regards, tom lane
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 11:11 AM, Mark Dilger > wrote: > > I will submit a patch that turns off autovacuum for the test node shortly. v5-0001-Turning-off-autovacuum-during-corruption-tests.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
> On Mar 15, 2021, at 10:04 AM, Tom Lane wrote: > > Looks like we're not quite out of the woods, as hornet and tern are > still unhappy: > > # Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs > expected 0)' > # at t/003_check.pl line 498. > > # Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/' > # at t/003_check.pl line 498. > # 'heap table "db1"."pg_catalog"."pg_statistic", block 2, > offset 1, attribute 27: > # final toast chunk number 0 differs from expected value 1 > # heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute > 27: > # toasted value for attribute 27 missing from toast table > # ' > # doesn't match '(?^:^$)' > # Looks like you failed 2 tests of 60. > [12:18:06] t/003_check.pl ... > Dubious, test returned 2 (wstat 512, 0x200) > Failed 2/60 subtests > > These animals have somewhat weird alignment properties: MAXALIGN is 8 > but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their > choices about whether an out-of-line TOAST value is needed, breaking > this test case. The pg_amcheck test case is not corrupting any pg_catalog tables, but contrib/amcheck/verify_heapam is complaining about a corruption in pg_catalog.pg_statistic. The logic in verify_heapam only looks for a value in the toast table if the tuple it gets from the main table (in this case, from pg_statistic) has an attribute that claims to be toasted. The error message we're seeing that you quoted above simply means that no entry exists in the toast table. The bit about "final toast chunk number 0 differs from expected value 1" is super unhelpful, as what it is really saying is that there were no chunks found. I should submit a patch to not print that message in cases where the attribute is missing from the toast table. Is it possible that pg_statistic really is corrupt here, and that this is not a bug in pg_amcheck? It's not like we've been checking for corruption in the build farm up till now. I notice that this test, as well as test 005_opclass_damage.pl, neglects to turn off autovacuum for the test node. So maybe the corruptions are getting propogated during autovacuum? This is just a guess, but I will submit a patch that turns off autovacuum for the test node shortly. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
Looks like we're not quite out of the woods, as hornet and tern are still unhappy: # Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs expected 0)' # at t/003_check.pl line 498. # Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/' # at t/003_check.pl line 498. # 'heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 27: # final toast chunk number 0 differs from expected value 1 # heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 27: # toasted value for attribute 27 missing from toast table # ' # doesn't match '(?^:^$)' # Looks like you failed 2 tests of 60. [12:18:06] t/003_check.pl ... Dubious, test returned 2 (wstat 512, 0x200) Failed 2/60 subtests These animals have somewhat weird alignment properties: MAXALIGN is 8 but ALIGNOF_DOUBLE is only 4. I speculate that that is affecting their choices about whether an out-of-line TOAST value is needed, breaking this test case. regards, tom lane
Re: pg_amcheck contrib application
On Sat, Mar 13, 2021 at 10:51:27AM -0800, Mark Dilger wrote: > > On Mar 13, 2021, at 10:46 AM, Noah Misch wrote: > > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: > >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger > >>> wrote: > >>> and the second deals with an apparent problem with IPC::Run shell > >>> expanding an asterisk on some platforms but not others. That second one, > >>> if true, seems like a problem with scope beyond the pg_amcheck project, > >>> as TestLib::command_checks_all uses IPC::Run, and it would be desirable > >>> to have consistent behavior across platforms. > > > >>> One of pg_amcheck's regression tests was passing an asterisk through > >>> TestLib's command_checks_all() command, which gets through to > >>> pg_amcheck without difficulty on most platforms, but appears to get > >>> shell expanded on Windows (jacana) and AIX (hoverfly). > > > > For posterity, I can't reproduce this on hoverfly. The suite fails the same > > way at f371a4c and f371a4c^. More-recently (commit 58f5749), the suite > > passes > > even after reverting f371a4c. Self-contained IPC::Run usage also does not > > corroborate the theory: > > > > [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e > > 'IPC::Run::run "printf", "%s\n", "*"' > > * > > [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e > > 'IPC::Run::run "sh", "-c", "printf %sn *"' > > COPYRIGHT > > GNUmakefile.in > > HISTORY > > Makefile > > README > > README.git > > aclocal.m4 > > config > > configure > > configure.ac > > contrib > > doc > > src > The reason I suspected that passing the '*' through IPC::Run had to do with > the error that pg_amcheck gave. It complained that too many arguments where > being passed to it, and that the first such argument was "pg_amcheck.c". > There's no reason pg_amcheck should know it's source file name, nor that the > regression test should know that, which suggested that the asterisk was being > shell expanded within the src/bin/pg_amcheck/ directory and the file listing > was being passed into pg_amcheck as arguments. I agree. I can reproduce the problem on Windows. Commit f371a4c fixed http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2021-03-12%2020%3A12%3A44 and I see logs of that kind of failure only on fairywren and jacana.
Re: pg_amcheck contrib application
> On Mar 13, 2021, at 10:46 AM, Noah Misch wrote: > > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger >>> wrote: >>> >>> and the second deals with an apparent problem with IPC::Run shell expanding >>> an asterisk on some platforms but not others. That second one, if true, >>> seems like a problem with scope beyond the pg_amcheck project, as >>> TestLib::command_checks_all uses IPC::Run, and it would be desirable to >>> have consistent behavior across platforms. > >>> One of pg_amcheck's regression tests was passing an asterisk through >>> TestLib's command_checks_all() command, which gets through to >>> pg_amcheck without difficulty on most platforms, but appears to get >>> shell expanded on Windows (jacana) and AIX (hoverfly). > > For posterity, I can't reproduce this on hoverfly. The suite fails the same > way at f371a4c and f371a4c^. More-recently (commit 58f5749), the suite passes > even after reverting f371a4c. Self-contained IPC::Run usage also does not > corroborate the theory: > > [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e > 'IPC::Run::run "printf", "%s\n", "*"' > * > [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e > 'IPC::Run::run "sh", "-c", "printf %sn *"' > COPYRIGHT > GNUmakefile.in > HISTORY > Makefile > README > README.git > aclocal.m4 > config > configure > configure.ac > contrib > doc > src > >> there is a similar symptom caused by an unrelated problem > >> Subject: [PATCH v3] Avoid use of non-portable option ordering in >> command_checks_all(). > > On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this. Thanks for investigating! The reason I suspected that passing the '*' through IPC::Run had to do with the error that pg_amcheck gave. It complained that too many arguments where being passed to it, and that the first such argument was "pg_amcheck.c". There's no reason pg_amcheck should know it's source file name, nor that the regression test should know that, which suggested that the asterisk was being shell expanded within the src/bin/pg_amcheck/ directory and the file listing was being passed into pg_amcheck as arguments. That theory may have been wrong, but it was the only theory I had at the time. I don't have access to any of the machines where that happened, so it is hard for me to investigate further. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: pg_amcheck contrib application
On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote: > > On Mar 12, 2021, at 3:24 PM, Mark Dilger > > wrote: > > > > and the second deals with an apparent problem with IPC::Run shell expanding > > an asterisk on some platforms but not others. That second one, if true, > > seems like a problem with scope beyond the pg_amcheck project, as > > TestLib::command_checks_all uses IPC::Run, and it would be desirable to > > have consistent behavior across platforms. > > One of pg_amcheck's regression tests was passing an asterisk through > > TestLib's command_checks_all() command, which gets through to > > pg_amcheck without difficulty on most platforms, but appears to get > > shell expanded on Windows (jacana) and AIX (hoverfly). For posterity, I can't reproduce this on hoverfly. The suite fails the same way at f371a4c and f371a4c^. More-recently (commit 58f5749), the suite passes even after reverting f371a4c. Self-contained IPC::Run usage also does not corroborate the theory: [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 'IPC::Run::run "printf", "%s\n", "*"' * [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 'IPC::Run::run "sh", "-c", "printf %sn *"' COPYRIGHT GNUmakefile.in HISTORY Makefile README README.git aclocal.m4 config configure configure.ac contrib doc src > there is a similar symptom caused by an unrelated problem > Subject: [PATCH v3] Avoid use of non-portable option ordering in > command_checks_all(). On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.
Re: pg_amcheck contrib application
On Sat, Mar 13, 2021 at 10:20 AM Mark Dilger wrote: > We want to use values that don't look like any of the others. The complete > set of nibbles in the values above is [012345678B], leaving the set [9ACDEF] > unused. The attached patch uses the value DEADF9F9 as it seems a little > easier to remember than other permutations of those nibbles. OK, committed. The nibbles seem less relevant than the bytes as a whole, but that's fine. -- Robert Haas EDB: http://www.enterprisedb.com
Re: pg_amcheck contrib application
> On Mar 13, 2021, at 6:50 AM, Tom Lane wrote: > > Robert Haas writes: >> I think it would be good to use a non-zero value here. We're doing a >> lot of poking into raw bytes here, and if something goes wrong, a zero >> value is more likely to look like something normal than whatever else. >> I suggest picking a value where all 8 bytes are the same, but not >> zero, and ideally chosen so that they don't look much like any of the >> surrounding bytes. > > Actually, it seems like we can let pack/unpack deal with byte-swapping > within 32-bit words; what we lose by not using 'q' format is just the > ability to correctly swap the two 32-bit words. Hence, any value in > which upper and lower halves are the same should work, say > 0x1234567812345678. > > regards, tom lane The heap tuple in question looks as follows, with in the spot we're debating: Tuple Layout (values in hex): t_xmin: 223 t_xmax: 0 t_field3: 0 bi_hi: 0 bi_lo: 0 ip_posid: 1 t_infomask2: 3 t_infomask: b06 t_hoff: 18 t_bits: 0 a_1: a_2: b_header: 11# little-endian, will be 88 on big endian b_body1: 61 b_body2: 62 b_body3: 63 b_body4: 64 b_body5: 65 b_body6: 66 b_body7: 67 c_va_header: 1 c_va_vartag: 12 c_va_rawsize: 2714 c_va_extsize: 2710 valueid and toastrelid are not shown, as they won't be stable. Relying on t_xmin to be stable makes the test brittle, but fortunately that is separated from a_1 and a_2 far enough that we should not need to worry about it. We want to use values that don't look like any of the others. The complete set of nibbles in the values above is [012345678B], leaving the set [9ACDEF] unused. The attached patch uses the value DEADF9F9 as it seems a little easier to remember than other permutations of those nibbles. v6-0001-pg_amcheck-continuing-to-fix-portability-problems.patch Description: Binary data — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company