Re: pg_amcheck contrib application

2021-05-03 Thread Robert Haas
On Fri, Apr 30, 2021 at 5:07 PM Robert Haas  wrote:
> On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger
>  wrote:
> > After further reflection, no other verbiage seems any better.  I'd say go 
> > ahead and commit it this way.
>
> OK. I'll plan to do that on Monday, barring objections.

Done now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 4:26 PM Mark Dilger
 wrote:
> After further reflection, no other verbiage seems any better.  I'd say go 
> ahead and commit it this way.

OK. I'll plan to do that on Monday, barring objections.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger



> On Apr 30, 2021, at 1:04 PM, Mark Dilger  wrote:
> 
>> toast value %u was expected to end at chunk %d, but ended while
>> expecting chunk %d
>> 
>> i.e. same as the currently-committed code, except for changing "ended
>> at" to "ended while expecting."
> 
> I find the grammar of this new formulation anomalous for hard to articulate 
> reasons not quite the same as but akin to mismatched verb aspect.

After further reflection, no other verbiage seems any better.  I'd say go ahead 
and commit it this way.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger



> On Apr 30, 2021, at 12:47 PM, Robert Haas  wrote:
> 
> Hmm, I think that might need adjustment, actually. What I was trying
> to do is compensate for the fact that what we now have is the next
> chunk_seq value we expect, not the last one we saw, nor the total
> number of chunks we've seen regardless of what chunk_seq they had. But
> I thought it would be too confusing to just give the chunk number we
> were expecting and not say anything about how many chunks we thought
> there would be in total. So maybe what I should do is change it to
> something like this:
> 
> toast value %u was expected to end at chunk %d, but ended while
> expecting chunk %d
> 
> i.e. same as the currently-committed code, except for changing "ended
> at" to "ended while expecting."

I find the grammar of this new formulation anomalous for hard to articulate 
reasons not quite the same as but akin to mismatched verb aspect.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 3:41 PM Mark Dilger
 wrote:
> I think that's committable.
>
> The only nitpick might be
>
> -   psprintf("toast value %u was expected to end 
> at chunk %d, but ended at chunk %d",
> +   psprintf("toast value %u index scan ended 
> early while expecting chunk %d of %d",
>
> When reporting to users about positions within a zero-based indexing scheme, 
> what does "while expecting chunk 3 of 4" mean?  Is it talking about the last 
> chunk from the set [0..3] which has cardinality 4, or does it mean the 
> next-to-last chunk from [0..4] which ends with chunk 4, or what?  The prior 
> language isn't any more clear than what you have here, so I have no objection 
> to committing this, but the prior language was probably as goofy as it was 
> because it was trying to deal with this issue.

Hmm, I think that might need adjustment, actually. What I was trying
to do is compensate for the fact that what we now have is the next
chunk_seq value we expect, not the last one we saw, nor the total
number of chunks we've seen regardless of what chunk_seq they had. But
I thought it would be too confusing to just give the chunk number we
were expecting and not say anything about how many chunks we thought
there would be in total. So maybe what I should do is change it to
something like this:

toast value %u was expected to end at chunk %d, but ended while
expecting chunk %d

i.e. same as the currently-committed code, except for changing "ended
at" to "ended while expecting."

--
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger



> On Apr 30, 2021, at 12:29 PM, Robert Haas  wrote:
> 
> OK, how about this version?

I think that's committable.

The only nitpick might be

-   psprintf("toast value %u was expected to end at 
chunk %d, but ended at chunk %d",
+   psprintf("toast value %u index scan ended early 
while expecting chunk %d of %d",

When reporting to users about positions within a zero-based indexing scheme, 
what does "while expecting chunk 3 of 4" mean?  Is it talking about the last 
chunk from the set [0..3] which has cardinality 4, or does it mean the 
next-to-last chunk from [0..4] which ends with chunk 4, or what?  The prior 
language isn't any more clear than what you have here, so I have no objection 
to committing this, but the prior language was probably as goofy as it was 
because it was trying to deal with this issue.

Thoughts?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 3:26 PM Mark Dilger
 wrote:
> It looks mostly good to me.  There is a off-by-one error introduced with:
>
> -   else if (chunkno != (endchunk + 1))
> +   else if (expected_chunk_seq < last_chunk_seq)
>
> I think that needs to be
>
> +  else if (expected_chunk_seq <= last_chunk_seq)
>
> because otherwise it won't complain if the only missing chunk is the very 
> last one.

OK, how about this version?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


simply-remove-chunkno-concept-v4.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger



> On Apr 30, 2021, at 11:56 AM, Robert Haas  wrote:
> 
> Can you review this version?

It looks mostly good to me.  There is a off-by-one error introduced with:

-   else if (chunkno != (endchunk + 1))
+   else if (expected_chunk_seq < last_chunk_seq)

I think that needs to be

+  else if (expected_chunk_seq <= last_chunk_seq)

because otherwise it won't complain if the only missing chunk is the very last 
one.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger


> On Apr 30, 2021, at 11:56 AM, Robert Haas  wrote:
> 
> On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger
>  wrote:
>> Just to be clear, I did not use your patch v1 as the starting point.
> 
> I thought that might be the case, but I was trying to understand what
> you didn't like about my version, and comparing them seemed like a way
> to figure that out.
> 
>> I took the code as committed to master as the starting point, used your 
>> corruption report verbiage changes and at least some of your variable naming 
>> choices, but did not use the rest, in large part because it didn't work.  It 
>> caused corruption messages to be reported against tables that have no 
>> corruption.  For that matter, your v2 patch doesn't work either, and in the 
>> same way.  To wit:
>> 
>>  heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, 
>> attribute 7:
>> toast value 13461 chunk 0 has size 1995, but expected size 1996
>> 
>> I think there is something wrong with the way you are trying to calculate 
>> and use extsize, because I'm not corrupting pg_catalog.pg_rewrite.  You can 
>> get these same results by applying your patch to master, building, and 
>> running 'make check' from src/bin/pg_amcheck/
> 
> Argh, OK, I didn't realize. Should be fixed in this version.
> 
>>> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
>>> because we cannot compute a sensible expected size in that case. I
>>> think your code will subtract a larger value from a smaller one and,
>>> this being unsigned arithmetic, say that the expected chunk size is
>>> something gigantic.
>> 
>> Your conclusion is probably right, but I think your analysis is based on a 
>> misreading of what "last_chunk_seq" means.  It's not the last one seen, but 
>> the last one expected.  (Should we rename the variable to avoid confusion?)  
>> It won't compute a gigantic size.  Rather, it will expect *every* chunk with 
>> chunk_seq >= last_chunk_seq to have whatever size is appropriate for the 
>> last chunk.
> 
> I realize it's the last one expected. That's the point: we don't have
> any expectation for the sizes of chunks higher than the last one we
> expected to see. If the value is 2000 bytes and the chunk size is 1996
> bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes.
> If not, we can complain. But it makes no sense to complain about chunk
> 2 being of a size we don't expect. We don't expect it to exist in the
> first place, so we have no notion of what size it ought to be.
> 
>> If we have seen any chunks, the variable is holding the expected next chunk 
>> seq, which is one greater than the last chunk seq we saw.
>> 
>> If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain 
>> ..."expected to end at chunk 4, but ended at chunk 1".  This is clearly by 
>> design and not merely a bug, though I tend to agree with you that this is a 
>> strange wording choice.  I can't remember exactly when and how we decided to 
>> word the message this way, but it has annoyed me for a while, and I assumed 
>> it was something you suggested a while back, because I don't recall doing 
>> it.  Either way, since you seem to also be bothered by this, I agree we 
>> should change it.
> 
> Can you review this version?
> 
> -- 
> Robert Haas
> EDB: http://www.enterprisedb.com
> 

As requested off-list, here are NOT FOR COMMIT, WIP patches for testing only.

The first patch allows toast tables to be updated and adds regression tests of 
corrupted toasted attributes.  I never quite got deletes from toast tables to 
work, and there are probably other gotchas still lurking even with inserts and 
updates, but it limps along well enough for testing pg_amcheck.

The second patch updates the expected output of pg_amcheck to match the 
verbiage that you suggested upthread.



v1-0001-Adding-modify-toast-and-test-pg_amcheck.patch.WIP
Description: Binary data


v1-0002-Modifying-toast-corruption-test-expected-output.patch.WIP
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Fri, Apr 30, 2021 at 2:31 PM Mark Dilger
 wrote:
> Just to be clear, I did not use your patch v1 as the starting point.

I thought that might be the case, but I was trying to understand what
you didn't like about my version, and comparing them seemed like a way
to figure that out.

> I took the code as committed to master as the starting point, used your 
> corruption report verbiage changes and at least some of your variable naming 
> choices, but did not use the rest, in large part because it didn't work.  It 
> caused corruption messages to be reported against tables that have no 
> corruption.  For that matter, your v2 patch doesn't work either, and in the 
> same way.  To wit:
>
>   heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, 
> attribute 7:
>  toast value 13461 chunk 0 has size 1995, but expected size 1996
>
> I think there is something wrong with the way you are trying to calculate and 
> use extsize, because I'm not corrupting pg_catalog.pg_rewrite.  You can get 
> these same results by applying your patch to master, building, and running 
> 'make check' from src/bin/pg_amcheck/

Argh, OK, I didn't realize. Should be fixed in this version.

> > 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
> > because we cannot compute a sensible expected size in that case. I
> > think your code will subtract a larger value from a smaller one and,
> > this being unsigned arithmetic, say that the expected chunk size is
> > something gigantic.
>
> Your conclusion is probably right, but I think your analysis is based on a 
> misreading of what "last_chunk_seq" means.  It's not the last one seen, but 
> the last one expected.  (Should we rename the variable to avoid confusion?)  
> It won't compute a gigantic size.  Rather, it will expect *every* chunk with 
> chunk_seq >= last_chunk_seq to have whatever size is appropriate for the last 
> chunk.

I realize it's the last one expected. That's the point: we don't have
any expectation for the sizes of chunks higher than the last one we
expected to see. If the value is 2000 bytes and the chunk size is 1996
bytes, we expect chunk 0 to be 1996 bytes and chunk 1 to be 4 bytes.
If not, we can complain. But it makes no sense to complain about chunk
2 being of a size we don't expect. We don't expect it to exist in the
first place, so we have no notion of what size it ought to be.

> If we have seen any chunks, the variable is holding the expected next chunk 
> seq, which is one greater than the last chunk seq we saw.
>
> If we expect chunks 0..3 and see chunk 0 but not chunk 1, it will complain 
> ..."expected to end at chunk 4, but ended at chunk 1".  This is clearly by 
> design and not merely a bug, though I tend to agree with you that this is a 
> strange wording choice.  I can't remember exactly when and how we decided to 
> word the message this way, but it has annoyed me for a while, and I assumed 
> it was something you suggested a while back, because I don't recall doing it. 
>  Either way, since you seem to also be bothered by this, I agree we should 
> change it.

Can you review this version?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


simply-remove-chunkno-concept-v3.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-30 Thread Mark Dilger



> On Apr 30, 2021, at 9:39 AM, Robert Haas  wrote:
> 
> On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger
>  wrote:
>> The attached patch changes amcheck corruption reports as discussed upthread. 
>>  This patch is submitted for the v14 development cycle as a bug fix, per 
>> your complaint that the committed code generates reports sufficiently 
>> confusing to a user as to constitute a bug.
>> 
>> All other code refactoring and additional checks discussed upthread are 
>> reserved for the v15 development cycle and are not included here.
>> 
>> The minimal patch (not attached) that does not rename any variables is 135 
>> lines.  Your patch was 159 lines.  The patch (attached) which includes your 
>> variable renaming is 174 lines.
> 
> Hi,
> 
> I have compared this against my version. I found the following differences:

Just to be clear, I did not use your patch v1 as the starting point.  I took 
the code as committed to master as the starting point, used your corruption 
report verbiage changes and at least some of your variable naming choices, but 
did not use the rest, in large part because it didn't work.  It caused 
corruption messages to be reported against tables that have no corruption.  For 
that matter, your v2 patch doesn't work either, and in the same way.  To wit:

  heap table "postgres"."pg_catalog"."pg_rewrite", block 6, offset 4, attribute 
7:
 toast value 13461 chunk 0 has size 1995, but expected size 1996

I think there is something wrong with the way you are trying to calculate and 
use extsize, because I'm not corrupting pg_catalog.pg_rewrite.  You can get 
these same results by applying your patch to master, building, and running 
'make check' from src/bin/pg_amcheck/


> 1. This version passes last_chunk_seq rather than extsize to
> check_toast_tuple(). But this results in having to call
> VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was
> nicer to do that in the caller, so that we don't do it twice.

I don't see that VARATT_EXTERNAL_GET_EXTSIZE() is worth too much concern, given 
that is just a struct access and a bit mask.  You are avoiding calculating that 
twice, but at the expense of calculating last_chunk_seq twice, which involves 
division.  I don't think the division can be optimized as a mere bit shift, 
since TOAST_MAX_CHUNK_SIZE is not in general a power of two.  (For example, on 
my laptop it is 1996.)

I don't say this to nitpick at the performance one way vs. the other.  I doubt 
it makes any real difference.  I'm just confused why you want to change this 
particular thing right now, given that it is not a bug.

> 2. You fixed some out-of-date comments.

Yes, because they were wrong.  That's on me.  I failed to update them in a 
prior patch.

> 3. You move the test for an unexpected chunk sequence further down in
> the function. I don't see the point;

Relative to your patch, perhaps.  Relative to master, no tests have been moved.

> I had put it by the related null
> check, and still think that's better. You also deleted my comment /*
> Either the TOAST index is corrupt, or we don't have all chunks. */
> which I would have preferred to keep.

That's fine.  I didn't mean to remove it.  I was just taking a minimalist 
approach to constructing the patch.

> 4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
> because we cannot compute a sensible expected size in that case. I
> think your code will subtract a larger value from a smaller one and,
> this being unsigned arithmetic, say that the expected chunk size is
> something gigantic.

Your conclusion is probably right, but I think your analysis is based on a 
misreading of what "last_chunk_seq" means.  It's not the last one seen, but the 
last one expected.  (Should we rename the variable to avoid confusion?)  It 
won't compute a gigantic size.  Rather, it will expect *every* chunk with 
chunk_seq >= last_chunk_seq to have whatever size is appropriate for the last 
chunk. 

> Returning and not issuing that complaint at all
> seems better.

That might be best.  I had been resisting that because I don't want the 
extraneous chunks to be reported without chunk size information.  When 
debugging corrupted toast, it may be interesting to know the size of the 
extraneous chunks.  If there are 1000 extra chunks, somebody might want to see 
the sizes of them.

> 5. You fixed the incorrect formula I had introduced for the expected
> size of the last chunk.

Not really.  I just didn't introduce any change in that area.

> 6. You changed the variable name in check_toasted_attribute() from
> expected_chunkno to chunkno, and initialized it later in the function
> instead of at declaration time. I don't find this to be an
> improvement;

I think I just left the variable name and its initialization unchanged.

> including the word "expected" seems to me to be
> substantially clearer. But I think I should have gone with
> expected_chunk_seq for better consistency.

I agree that is a better name.


Re: pg_amcheck contrib application

2021-04-30 Thread Robert Haas
On Mon, Apr 26, 2021 at 1:52 PM Mark Dilger
 wrote:
> The attached patch changes amcheck corruption reports as discussed upthread.  
> This patch is submitted for the v14 development cycle as a bug fix, per your 
> complaint that the committed code generates reports sufficiently confusing to 
> a user as to constitute a bug.
>
> All other code refactoring and additional checks discussed upthread are 
> reserved for the v15 development cycle and are not included here.
>
> The minimal patch (not attached) that does not rename any variables is 135 
> lines.  Your patch was 159 lines.  The patch (attached) which includes your 
> variable renaming is 174 lines.

Hi,

I have compared this against my version. I found the following differences:

1. This version passes last_chunk_seq rather than extsize to
check_toast_tuple(). But this results in having to call
VARATT_EXTERNAL_GET_EXTSIZE() inside that function. I thought it was
nicer to do that in the caller, so that we don't do it twice.

2. You fixed some out-of-date comments.

3. You move the test for an unexpected chunk sequence further down in
the function. I don't see the point; I had put it by the related null
check, and still think that's better. You also deleted my comment /*
Either the TOAST index is corrupt, or we don't have all chunks. */
which I would have preferred to keep.

4. You don't return if chunk_seq > last_chunk_seq. That seems wrong,
because we cannot compute a sensible expected size in that case. I
think your code will subtract a larger value from a smaller one and,
this being unsigned arithmetic, say that the expected chunk size is
something gigantic. Returning and not issuing that complaint at all
seems better.

5. You fixed the incorrect formula I had introduced for the expected
size of the last chunk.

6. You changed the variable name in check_toasted_attribute() from
expected_chunkno to chunkno, and initialized it later in the function
instead of at declaration time. I don't find this to be an
improvement; including the word "expected" seems to me to be
substantially clearer. But I think I should have gone with
expected_chunk_seq for better consistency.

7. You restored the message "toast value %u was expected to end at
chunk %d, but ended at chunk %d" which my version deleted. I deleted
that message because I thought it was redundant, but I guess it's not:
there's nothing else to complain if the sequence of chunks ends early.
I think we should change the test from != to < though, because if it's
>, then we must have already complained about unexpected chunks. Also,
I think the message is actually wrong, because even though you renamed
the variable, it still ends up being the expected next chunkno rather
than the last chunkno we actually saw.

PFA my counter-proposal based on the above analysis.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


simply-remove-chunkno-concept-v2.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-26 Thread Mark Dilger


> On Apr 23, 2021, at 3:01 PM, Mark Dilger  wrote:
> 
> I'll try to post something that accomplishes the changes to the reports that 
> you are looking for.

The attached patch changes amcheck corruption reports as discussed upthread.  
This patch is submitted for the v14 development cycle as a bug fix, per your 
complaint that the committed code generates reports sufficiently confusing to a 
user as to constitute a bug.

All other code refactoring and additional checks discussed upthread are 
reserved for the v15 development cycle and are not included here.

The minimal patch (not attached) that does not rename any variables is 135 
lines.  Your patch was 159 lines.  The patch (attached) which includes your 
variable renaming is 174 lines.



v23-0001-amcheck-adjusting-corruption-report-output.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 1:31 PM, Robert Haas  wrote:
> 
> Perhaps something like this, closer to the way you had it?
> 
>   expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
>   : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE);

It still suffers the same failures.  I'll try to post something that 
accomplishes the changes to the reports that you are looking for.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:36 PM Mark Dilger
 wrote:
> > What's different?
>
> for one thing, if a sequence of chunks happens to fit perfectly, the final 
> chunk will have size TOAST_MAX_CHUNK_SIZE, but you're expecting no larger 
> than one less than that, given how modulo arithmetic works.

Good point.

Perhaps something like this, closer to the way you had it?

   expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
   : extsize - (last_chunk_seq * TOAST_MAX_CHUNK_SIZE);

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 11:29 AM, Robert Haas  wrote:
> 
> +   expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
> +   : extsize % TOAST_MAX_CHUNK_SIZE;
> 
> What's different?

for one thing, if a sequence of chunks happens to fit perfectly, the final 
chunk will have size TOAST_MAX_CHUNK_SIZE, but you're expecting no larger than 
one less than that, given how modulo arithmetic works.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 11:29 AM, Robert Haas  wrote:
> 
> On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger
>  wrote:
>> Another difference I should probably mention is that a bunch of unrelated 
>> tests are failing with errors like:
>> 
>>toast value 13465 chunk 0 has size 1995, but expected size 1996
>> 
>> which leads me to suspect your changes to how the size is calculated.
> 
> That seems like a pretty reasonable suspicion, but I can't see the problem:
> 
> -   expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE
> -   : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) -
> (endchunk * TOAST_MAX_CHUNK_SIZE);
> +   expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
> +   : extsize % TOAST_MAX_CHUNK_SIZE;
> 
> What's different?
> 
> 1. The variables are renamed.
> 
> 2. It uses a new variable extsize instead of recomputing
> VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that
> should have the same value.
> 
> 3. I used modulo arithmetic (%) instead of subtracting endchunk *
> TOAST_MAX_CHUNK_SIZE.
> 
> Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert?

On  my laptop, yes, 1996 is TOAST_MAX_CHUNK_SIZE.

I'm not inserting anything.  These failures come from just regular tests that I 
have not changed.  I just applied your patch and ran `make check-world` and 
these fail in src/bin/pg_amcheck 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:15 PM Mark Dilger
 wrote:
> Another difference I should probably mention is that a bunch of unrelated 
> tests are failing with errors like:
>
> toast value 13465 chunk 0 has size 1995, but expected size 1996
>
> which leads me to suspect your changes to how the size is calculated.

That seems like a pretty reasonable suspicion, but I can't see the problem:

-   expected_size = curchunk < endchunk ? TOAST_MAX_CHUNK_SIZE
-   : VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer) -
(endchunk * TOAST_MAX_CHUNK_SIZE);
+   expected_size = chunk_seq < last_chunk_seq ? TOAST_MAX_CHUNK_SIZE
+   : extsize % TOAST_MAX_CHUNK_SIZE;

What's different?

1. The variables are renamed.

2. It uses a new variable extsize instead of recomputing
VARATT_EXTERNAL_GET_EXTSIZE(ta->toast_pointer), but I think that
should have the same value.

3. I used modulo arithmetic (%) instead of subtracting endchunk *
TOAST_MAX_CHUNK_SIZE.

Is TOAST_MAX_CHUNK_SIZE 1996? How long a value did you insert?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 11:05 AM, Mark Dilger  
> wrote:
> 
> Here are the differences between master and you patch:

Another difference I should probably mention is that a bunch of unrelated tests 
are failing with errors like:

toast value 13465 chunk 0 has size 1995, but expected size 1996

which leads me to suspect your changes to how the size is calculated.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Fri, Apr 23, 2021 at 2:05 PM Mark Dilger
 wrote:
> Here are the differences between master and you patch:

Thanks. Those messages look reasonable to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 10:31 AM, Mark Dilger  
> wrote:
> 
> I will test your patch and see what differs.

Here are the differences between master and you patch:

UPDATE $toastname SET chunk_seq = chunk_seq + 1000 WHERE chunk_id = 
$value_id_to_corrupt

-   qr/${header}toast value 16459 chunk 0 has sequence 
number 1000, but expected sequence number 0/,
-   qr/${header}toast value 16459 chunk 1 has sequence 
number 1001, but expected sequence number 1/,
-   qr/${header}toast value 16459 chunk 2 has sequence 
number 1002, but expected sequence number 2/,
-   qr/${header}toast value 16459 chunk 3 has sequence 
number 1003, but expected sequence number 3/,
-   qr/${header}toast value 16459 chunk 4 has sequence 
number 1004, but expected sequence number 4/,
-   qr/${header}toast value 16459 chunk 5 has sequence 
number 1005, but expected sequence number 5/;
+   qr/${header}toast value 16459 index scan returned chunk 1000 
when expecting chunk 0/,
+   qr/${header}toast value 16459 chunk 1000 follows last expected 
chunk 5/,
+   qr/${header}toast value 16459 chunk 1001 follows last expected 
chunk 5/,
+   qr/${header}toast value 16459 chunk 1002 follows last expected 
chunk 5/,
+   qr/${header}toast value 16459 chunk 1003 follows last expected 
chunk 5/,
+   qr/${header}toast value 16459 chunk 1004 follows last expected 
chunk 5/,
+   qr/${header}toast value 16459 chunk 1005 follows last expected 
chunk 5/;

UPDATE $toastname SET chunk_seq = chunk_seq * 1000 WHERE chunk_id = 
$value_id_to_corrupt

-   qr/${header}toast value $value_id_to_corrupt chunk 1 
has sequence number 1000, but expected sequence number 1/,
-   qr/${header}toast value $value_id_to_corrupt chunk 2 
has sequence number 2000, but expected sequence number 2/,
-   qr/${header}toast value $value_id_to_corrupt chunk 3 
has sequence number 3000, but expected sequence number 3/,
-   qr/${header}toast value $value_id_to_corrupt chunk 4 
has sequence number 4000, but expected sequence number 4/,
-   qr/${header}toast value $value_id_to_corrupt chunk 5 
has sequence number 5000, but expected sequence number 5/;
-
+   qr/${header}toast value 16460 index scan returned chunk 1000 
when expecting chunk 1/,
+   qr/${header}toast value 16460 chunk 1000 follows last expected 
chunk 5/,
+   qr/${header}toast value 16460 index scan returned chunk 2000 
when expecting chunk 1001/,
+   qr/${header}toast value 16460 chunk 2000 follows last expected 
chunk 5/,
+   qr/${header}toast value 16460 index scan returned chunk 3000 
when expecting chunk 2001/,
+   qr/${header}toast value 16460 chunk 3000 follows last expected 
chunk 5/,
+   qr/${header}toast value 16460 index scan returned chunk 4000 
when expecting chunk 3001/,
+   qr/${header}toast value 16460 chunk 4000 follows last expected 
chunk 5/,
+   qr/${header}toast value 16460 index scan returned chunk 5000 
when expecting chunk 4001/,
+   qr/${header}toast value 16460 chunk 5000 follows last expected 
chunk 5/;

INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
(SELECT chunk_id,
10*chunk_seq + 1000,
chunk_data
FROM $toastname
WHERE chunk_id = $value_id_to_corrupt)

-   qr/${header}toast value $value_id_to_corrupt chunk 6 
has sequence number 1000, but expected sequence number 6/,
-   qr/${header}toast value $value_id_to_corrupt chunk 7 
has sequence number 1010, but expected sequence number 7/,
-   qr/${header}toast value $value_id_to_corrupt chunk 8 
has sequence number 1020, but expected sequence number 8/,
-   qr/${header}toast value $value_id_to_corrupt chunk 9 
has sequence number 1030, but expected sequence number 9/,
-   qr/${header}toast value $value_id_to_corrupt chunk 10 
has sequence number 1040, but expected sequence number 10/,
-   qr/${header}toast value $value_id_to_corrupt chunk 11 
has sequence number 1050, but expected sequence number 11/,
-   qr/${header}toast value $value_id_to_corrupt was 
expected to end at chunk 6, but ended at chunk 12/;
+  qr/${header}toast value $value_id_to_corrupt index scan returned 
chunk 1000 when expecting chunk 6/,
+  qr/${header}toast value $value_id_to_corrupt chunk 1000 follows 
last expected chunk 5/,
+  qr/${header}toast value $value_id_to_corrupt index scan returned 
chunk 1010 when expecting chunk 1001/,
+  qr/${header}toast value $value_id_to_corrupt chunk 1010 follows 
last expected chunk 5/,
+  

Re: pg_amcheck contrib application

2021-04-23 Thread Mark Dilger



> On Apr 23, 2021, at 10:28 AM, Robert Haas  wrote:
> 
> On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger
>  wrote:
>> I have refactored the patch to address your other concerns.  Breaking the 
>> patch into multiple pieces didn't add any clarity, but refactoring portions 
>> of it made things simpler to read, I think, so here it is as one patch file.
> 
> I was hoping that this version was going to be smaller than the last
> version, but instead it went from 300+ lines to 500+ lines.
> 
> The main thing I'm unhappy about in the status quo is the use of
> chunkno in error messages. I have suggested several times making that
> concept go away, because I think users will be confused. Here's a
> minimal patch that does just that. It's 32 lines and results in a net
> removal of 4 lines. It differs somewhat from my earlier suggestions,
> because my priority here is to get reasonably understandable output
> without needing a ton of code, and as I was working on this I found
> that some of my earlier suggestions would have needed more code to
> implement and I didn't think it bought enough to be worth it. It's
> possible this is too simple, or that it's buggy, so let me know what
> you think. But basically, I think what got committed before is
> actually mostly fine and doesn't need major revision. It just needs
> tidying up to avoid the confusing chunkno concept.
> 
> Now, the other thing we've talked about is adding a few more checks,
> to verify for example that the toastrelid is what we expect, and I see
> in your v22 you thought of a few other things. I think we can consider
> those, possibly as things where we consider it tidying up loose ends
> for v14, or else as improvements for v15. But I don't think that the
> fairly large size of your patch comes primarily from additional
> checks. I think it mostly comes from the code to produce error reports
> getting a lot more complicated. I apologize if my comments have driven
> that complexity, but they weren't intended to.
> 
> One tiny problem with the attached patch is that it does not make any
> regression tests fail, which also makes it hard for me to tell if it
> breaks anything, or if the existing code works. I don't know how
> practical it is to do anything about that. Do you have a patch handy
> that allows manual updates and deletes on TOAST tables, for manual
> testing purposes?

Yes, I haven't been posting that with the patch because, but I will test your 
patch and see what differs.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-23 Thread Robert Haas
On Thu, Apr 22, 2021 at 7:28 PM Mark Dilger
 wrote:
> I have refactored the patch to address your other concerns.  Breaking the 
> patch into multiple pieces didn't add any clarity, but refactoring portions 
> of it made things simpler to read, I think, so here it is as one patch file.

I was hoping that this version was going to be smaller than the last
version, but instead it went from 300+ lines to 500+ lines.

The main thing I'm unhappy about in the status quo is the use of
chunkno in error messages. I have suggested several times making that
concept go away, because I think users will be confused. Here's a
minimal patch that does just that. It's 32 lines and results in a net
removal of 4 lines. It differs somewhat from my earlier suggestions,
because my priority here is to get reasonably understandable output
without needing a ton of code, and as I was working on this I found
that some of my earlier suggestions would have needed more code to
implement and I didn't think it bought enough to be worth it. It's
possible this is too simple, or that it's buggy, so let me know what
you think. But basically, I think what got committed before is
actually mostly fine and doesn't need major revision. It just needs
tidying up to avoid the confusing chunkno concept.

Now, the other thing we've talked about is adding a few more checks,
to verify for example that the toastrelid is what we expect, and I see
in your v22 you thought of a few other things. I think we can consider
those, possibly as things where we consider it tidying up loose ends
for v14, or else as improvements for v15. But I don't think that the
fairly large size of your patch comes primarily from additional
checks. I think it mostly comes from the code to produce error reports
getting a lot more complicated. I apologize if my comments have driven
that complexity, but they weren't intended to.

One tiny problem with the attached patch is that it does not make any
regression tests fail, which also makes it hard for me to tell if it
breaks anything, or if the existing code works. I don't know how
practical it is to do anything about that. Do you have a patch handy
that allows manual updates and deletes on TOAST tables, for manual
testing purposes?

-- 
Robert Haas
EDB: http://www.enterprisedb.com


simply-remove-chunkno-concept.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-22 Thread Mark Dilger


> On Apr 19, 2021, at 5:07 PM, Mark Dilger  wrote:
> 
> 
> 
>> On Apr 19, 2021, at 12:50 PM, Robert Haas  wrote:
>> 
>> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
>>  wrote:
>>> I have added the verb "has" rather than "contains" because "has" is more 
>>> consistent with the phrasing of other similar corruption reports.
>> 
>> That makes sense.

I have refactored the patch to address your other concerns.  Breaking the patch 
into multiple pieces didn't add any clarity, but refactoring portions of it 
made things simpler to read, I think, so here it is as one patch file.



v22-0001-amcheck-adding-toast-pointer-corruption-checks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-04-19 Thread Mark Dilger



> On Apr 19, 2021, at 12:50 PM, Robert Haas  wrote:
> 
> On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
>  wrote:
>> I have added the verb "has" rather than "contains" because "has" is more 
>> consistent with the phrasing of other similar corruption reports.
> 
> That makes sense.
> 
> I think it's odd that a range of extraneous chunks is collapsed into a
> single report if the size of each chunk happens to be
> TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
> first and last extraneous chunk and the size of each? If the next
> chunk you see is the next one in sequence and the same size as all the
> others, extend your notion of the sequence end by 1. Otherwise, report
> the range accumulated so far. It seems to me that this wouldn't be any
> more code than you have now, and might actually be less.

In all cases of uncorrupted toasted attributes, the sequence of N chunks that 
make up the attribute should be N-1 chunks of TOAST_MAX_CHUNK_SIZE ending with 
a single chunk of up to TOAST_MAX_CHUNK_SIZE.  I'd like to refer to such 
sequences as "reasonably sized" sequences to make conversation easier.

If the toast pointer's va_extsize field leads us to believe that we should find 
10 reasonably sized chunks, but instead we find 30 reasonably sized chunks, we 
know something is corrupt.  We shouldn't automatically prejudice the user 
against the additional 20 chunks.  We didn't expect them, but maybe that's 
because va_extsize was corrupt and gave us a false expectation.  We're not 
pointing fingers one way or the other.

On the other hand, if we expect 10 chunks and find an additional 20 
unreasonably sized chunks, we can and should point fingers at the extra 20 
chunks.  Even if we somehow knew that va_extsize was also corrupt, we'd still 
be justified in saying the 20 unreasonably sized chunks are each, individually 
corrupt.

I tried to write the code to report one corruption message per corruption 
found.  There are some edge cases where this is a definitional challenge, so 
it's not easy to say that I've always achieved this goal, but I think i've done 
so where the definitions are clear.  As such, the only time I'd want to combine 
toast chunks into a single corruption message is when they are not in 
themselves necessarily *individually* corrupt.  That is why I wrote the code to 
use TOAST_MAX_CHUNK_SIZE rather than just storing up any series of equally 
sized chunks.

On a related note, when complaining about a sequence of toast chunks, often the 
sequence is something like [maximal, maximal, ..., maximal, partial], but 
sometimes it's just [maximal...maximal], sometimes just [maximal], and 
sometimes just [partial].  If I'm complaining about that entire sequence, I'd 
really like to do so in just one message, otherwise it looks like separate 
complaints.

I can certainly change the code to be how you are asking, but I'd first like to 
know that you really understood what I was doing here and why the reports read 
the way they do.

> I think that report_missing_chunks() could probably just report the
> range of missing chunks and not bother reporting how big they were
> expected to be. But, if it is going to report how big they were
> expected to be, I think it should have only 2 cases rather than 4:
> either a range of missing chunks of equal size, or a single missing
> chunk of some size. If, as I propose, it doesn't report the expected
> size, then you still have just 2 cases: a range of missing chunks, or
> a single missing chunk.

Right, this is the same as above.  I'm trying not to split a single corruption 
complaint into separate reports.

> Somehow I have a hard time feeling confident that check_toast_tuple()
> is going to do the right thing. The logic is really complex and hard
> to understand. 'chunkno' is a counter that advances every time we move
> to the next chunk, and 'curchunk' is the value we actually find in the
> TOAST tuple. This terminology is not easy to understand. Most messages
> now report 'curchunk', but some still report 'chunkno'. Why does
> 'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
> and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
> particular reason why what you're calling 'chunkno' needs to exist
> even as a local variable, let alone be printed out. Either we haven't
> yet validated that the chunk_id extracted from the tuple is non-null
> and greater than the last chunk number we saw, in which case we can
> just complain about it if we find it to be otherwise, or we have
> already done that validation, in which case we should complain about
> that value and not 'chunkno' in any subsequent messages.

If we use tctx->last_chunk_seen as you propose, I imagine we'd set that to -1 
prior to the first call to check_toast_tuple().  In the first call, we'd 
extract the toast chunk_seq and store it in curchunk and verify that it's one 
greater than tctx->last_chunk_seen.  That all seems fine.

But under corrupt 

Re: pg_amcheck contrib application

2021-04-19 Thread Robert Haas
On Thu, Apr 15, 2021 at 1:07 PM Mark Dilger
 wrote:
> I have added the verb "has" rather than "contains" because "has" is more 
> consistent with the phrasing of other similar corruption reports.

That makes sense.

I think it's odd that a range of extraneous chunks is collapsed into a
single report if the size of each chunk happens to be
TOAST_MAX_CHUNK_SIZE and not otherwise. Why not just remember the
first and last extraneous chunk and the size of each? If the next
chunk you see is the next one in sequence and the same size as all the
others, extend your notion of the sequence end by 1. Otherwise, report
the range accumulated so far. It seems to me that this wouldn't be any
more code than you have now, and might actually be less.

I think that report_missing_chunks() could probably just report the
range of missing chunks and not bother reporting how big they were
expected to be. But, if it is going to report how big they were
expected to be, I think it should have only 2 cases rather than 4:
either a range of missing chunks of equal size, or a single missing
chunk of some size. If, as I propose, it doesn't report the expected
size, then you still have just 2 cases: a range of missing chunks, or
a single missing chunk.

Somehow I have a hard time feeling confident that check_toast_tuple()
is going to do the right thing. The logic is really complex and hard
to understand. 'chunkno' is a counter that advances every time we move
to the next chunk, and 'curchunk' is the value we actually find in the
TOAST tuple. This terminology is not easy to understand. Most messages
now report 'curchunk', but some still report 'chunkno'. Why does
'chunkno' need to exist at all? AFAICS the combination of 'curchunk'
and 'tctx->last_chunk_seen' ought to be sufficient. I can see no
particular reason why what you're calling 'chunkno' needs to exist
even as a local variable, let alone be printed out. Either we haven't
yet validated that the chunk_id extracted from the tuple is non-null
and greater than the last chunk number we saw, in which case we can
just complain about it if we find it to be otherwise, or we have
already done that validation, in which case we should complain about
that value and not 'chunkno' in any subsequent messages.

The conditionals between where you set max_valid_prior_chunk and where
you set last_chunk_seen seem hard to understand, particularly the
bifurcated way that missing chunks are reported. Initial missing
chunks are detected by (chunkno == 0 && max_valid_prior_chunk >= 0)
and later missing chunks are detected by (chunkno > 0 &&
max_valid_prior_chunk > tctx->last_chunk_seen). I'm not sure if this
is correct; I find it hard to get my head around what
max_valid_prior_chunk is supposed to represent. But in any case I
think it can be written more simply. Just keep track of what chunk_id
we expect to extract from the next TOAST tuple. Initially it's 0.
Then:

if (chunkno < tctx->expected_chunkno)
{
   // toast value %u index scan returned chunk number %d when chunk %d
was expected
   // don't modify tctx->expected_chunkno here, just hope the next
thing matches our previous expectation
}
else
{
if (chunkno > tctx->expected_chunkno)
// chunks are missing from tctx->expected_chunkno through
Min(chunkno - 1, tctx->final_expected_chunk), provided that the latter
value is greater than or equal to the former
tctx->expected_chunkno = chunkno + 1;
}

If you do this, you only need to report extraneous chunks when chunkno
> tctx->final_expected_chunk, since chunkno < 0 is guaranteed to
trigger the first of the two complaints shown above.

In check_tuple_attribute I suggest "bool valid = false" rather than
"bool invalid = true". I think it's easier to understand.

I object to check_toasted_attribute() using 'chunkno' in a message for
the same reasons as above in regards to check_toast_tuple() i.e. I
think it's a concept which should not exist.

I think this patch could possibly be split up into multiple patches.
There's some question in my mind whether it's getting too late to
commit any of this, since some of it looks suspiciously like new
features after feature freeze. However, I kind of hate to ship this
release without at least doing something about the chunkno vs.
curchunk stuff, which is even worse in the committed code than in your
patch, and which I think will confuse the heck out of users if those
messages actually fire for anyone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-15 Thread Mark Dilger


> On Apr 14, 2021, at 10:17 AM, Robert Haas  wrote:
> 
> On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger
>  wrote:
>> It now reports:
>> 
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> # toast value 16461 missing chunk 3 with expected size 1996
>> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
>> # toast value 16461 was expected to end at chunk 6 with total size 
>> 1, but ended at chunk 5 with total size 8004
>> 
>> It sounds like you weren't expecting the second of these reports.  I think 
>> it is valuable, especially when there are multiple missing chunks and 
>> multiple extraneous chunks, as it makes it easier for the user to reconcile 
>> the missing chunks against the extraneous chunks.
> 
> I wasn't, but I'm not overwhelmingly opposed to it, either. I do think
> I would be in favor of splitting this kind of thing up into two
> messages:
> 
> # toast value 16459 unexpected chunks 1000 through 1004 each with
> size 1996 followed by chunk 1005 with size 20
> 
> We'll have fewer message variants and I don't think any real
> regression in usability if we say:
> 
> # toast value 16459 has unexpected chunks 1000 through 1004 each
> with size 1996
> # toast value 16459 has unexpected chunk 1005 with size 20

Changed.

> (Notice that I also inserted "has" so that the sentence a verb. Or we
> could "contains.")

I have added the verb "has" rather than "contains" because "has" is more 
consistent with the phrasing of other similar corruption reports.



v21-0001-amcheck-adding-toast-pointer-corruption-checks.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-04-14 Thread Robert Haas
On Mon, Apr 12, 2021 at 11:06 PM Mark Dilger
 wrote:
> It now reports:
>
> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
> # toast value 16461 missing chunk 3 with expected size 1996
> # heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
> # toast value 16461 was expected to end at chunk 6 with total size 1, 
> but ended at chunk 5 with total size 8004
>
> It sounds like you weren't expecting the second of these reports.  I think it 
> is valuable, especially when there are multiple missing chunks and multiple 
> extraneous chunks, as it makes it easier for the user to reconcile the 
> missing chunks against the extraneous chunks.

I wasn't, but I'm not overwhelmingly opposed to it, either. I do think
I would be in favor of splitting this kind of thing up into two
messages:

# toast value 16459 unexpected chunks 1000 through 1004 each with
size 1996 followed by chunk 1005 with size 20

We'll have fewer message variants and I don't think any real
regression in usability if we say:

# toast value 16459 has unexpected chunks 1000 through 1004 each
with size 1996
# toast value 16459 has unexpected chunk 1005 with size 20

(Notice that I also inserted "has" so that the sentence a verb. Or we
could "contains.")

I committed 0001.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-12 Thread Mark Dilger


> On Apr 9, 2021, at 1:51 PM, Robert Haas  wrote:
> 
> On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger  
> wrote:
>> I think #4, above, requires some clarification.  If there are missing 
>> chunks, the very definition of how large we expect subsequent chunks to be 
>> is ill-defined.  I took a fairly conservative approach to avoid lots of 
>> bogus complaints about chunks that are of unexpected size.   Not all such 
>> complaints are removed, but enough are removed that I needed to add a final 
>> complaint at the end about the total size seen not matching the total size 
>> expected.
> 
> My instinct is to suppose that the size that we expect for future
> chunks is independent of anything being wrong with previous chunks. So
> if each chunk is supposed to be 2004 bytes (which probably isn't the
> real number) and the value is 7000 bytes long, we expect chunks 0-2 to
> be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to
> not exist. If chunk 1 happens to be missing or the wrong length or
> whatever, our expectations for chunks 2 and 3 are utterly unchanged.

Fair enough.

>> Corruption #1:
>> 
>>UPDATE $toastname SET chunk_seq = chunk_seq + 1000
>> 
>> Before:
>> 
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 0 has sequence number 1000, but expected 
>> sequence number 0
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 1 has sequence number 1001, but expected 
>> sequence number 1
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 2 has sequence number 1002, but expected 
>> sequence number 2
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 3 has sequence number 1003, but expected 
>> sequence number 3
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 4 has sequence number 1004, but expected 
>> sequence number 4
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 chunk 5 has sequence number 1005, but expected 
>> sequence number 5
>> 
>> After:
>> 
>> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
>> # toast value 16445 missing chunks 0 through 999
> 
> Applying the above principle would lead to complaints that chunks 0-5
> are missing, and 1000-1005 are extra.

That sounds right.  It now reports:

# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
# toast value 16459 missing chunks 0 through 4 with expected size 1996 and 
chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 16, attribute 2:
# toast value 16459 unexpected chunks 1000 through 1004 each with size 1996 
followed by chunk 1005 with size 20


>> Corruption #2:
>> 
>>UPDATE $toastname SET chunk_seq = chunk_seq * 1000
> 
> Similarly here, except the extra chunk numbers are different.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 missing chunks 1 through 4 with expected size 1996 and 
chunk 5 with expected size 20
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 1000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 2000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 3000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 4000 with size 1996
# heap table "postgres"."public"."test", block 0, offset 17, attribute 2:
# toast value 16460 unexpected chunk 5000 with size 20

I don't see any good way in this case to report the extra chunks in one row, as 
in the general case there could be arbitrarily many of them, with the message 
text getting arbitrarily large if it reported the chunks as "chunks 1000, 2000, 
3000, 4000, 5000, ...".  I don't expect this sort of corruption being 
particularly common, though, so I'm not too bothered about it.

> 
>> Corruption #3:
>> 
>>UPDATE $toastname SET chunk_id = (chunk_id::integer + 1000)::oid 
>> WHERE chunk_seq = 3
> 
> And here we'd just get a complaint that chunk 3 is missing.

It now reports:

# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
# toast value 16461 missing chunk 3 with expected size 1996
# heap table "postgres"."public"."test", block 0, offset 18, attribute 2:
# toast value 16461 was expected to end at chunk 6 with total size 1, 
but ended at chunk 5 with total size 8004

It sounds like you weren't expecting the second of these reports.  I think it 
is valuable, especially when there are multiple 

Re: pg_amcheck contrib application

2021-04-09 Thread Robert Haas
On Fri, Apr 9, 2021 at 2:50 PM Mark Dilger  wrote:
> I think #4, above, requires some clarification.  If there are missing chunks, 
> the very definition of how large we expect subsequent chunks to be is 
> ill-defined.  I took a fairly conservative approach to avoid lots of bogus 
> complaints about chunks that are of unexpected size.   Not all such 
> complaints are removed, but enough are removed that I needed to add a final 
> complaint at the end about the total size seen not matching the total size 
> expected.

My instinct is to suppose that the size that we expect for future
chunks is independent of anything being wrong with previous chunks. So
if each chunk is supposed to be 2004 bytes (which probably isn't the
real number) and the value is 7000 bytes long, we expect chunks 0-2 to
be 2004 bytes each, chunk 3 to be 988 bytes, and chunk 4 and higher to
not exist. If chunk 1 happens to be missing or the wrong length or
whatever, our expectations for chunks 2 and 3 are utterly unchanged.

> Corruption #1:
>
> UPDATE $toastname SET chunk_seq = chunk_seq + 1000
>
> Before:
>
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 0 has sequence number 1000, but expected 
> sequence number 0
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 1 has sequence number 1001, but expected 
> sequence number 1
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 2 has sequence number 1002, but expected 
> sequence number 2
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 3 has sequence number 1003, but expected 
> sequence number 3
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 4 has sequence number 1004, but expected 
> sequence number 4
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 chunk 5 has sequence number 1005, but expected 
> sequence number 5
>
> After:
>
> # heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
> # toast value 16445 missing chunks 0 through 999

Applying the above principle would lead to complaints that chunks 0-5
are missing, and 1000-1005 are extra.

> Corruption #2:
>
> UPDATE $toastname SET chunk_seq = chunk_seq * 1000

Similarly here, except the extra chunk numbers are different.

> Corruption #3:
>
> UPDATE $toastname SET chunk_id = (chunk_id::integer + 1000)::oid 
> WHERE chunk_seq = 3

And here we'd just get a complaint that chunk 3 is missing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-09 Thread Mark Dilger



> On Apr 8, 2021, at 3:11 PM, Robert Haas  wrote:
> 
> On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger  
> wrote:
>> All this leads me to believe that we should report the following:
>> 
>> 1) If the total number of chunks retrieved differs from the expected number, 
>> report how many we expected vs. how many we got
>> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
>> 3) If the index scan returned chunks out of chunk_seq order, report that
>> 4) If any chunk is not the expected size, report that
>> 
>> So, for your example of chunk 1 missing from chunks [0..17], we'd report 
>> that we got one fewer chunks than we expected, that the second chunk seen 
>> was discontiguous from the first chunk seen, that the final chunk seen was 
>> smaller than expected by M bytes, and that the total size was smaller than 
>> we expected by N bytes.  The third of those is somewhat misleading, since 
>> the final chunk was presumably the right size; we just weren't expecting to 
>> hit a partial chunk quite yet.  But I don't see how to make that better in 
>> the general case.
> 
> Hmm, that might be OK. It seems like it's going to be a bit verbose in
> simple cases like 1 missing chunk, but on the plus side, it avoids a
> mountain of output if the raw size has been overwritten with a
> gigantic bogus value. But, how is #2 different from #3? Those sound
> like the same thing to me.

I think #4, above, requires some clarification.  If there are missing chunks, 
the very definition of how large we expect subsequent chunks to be is 
ill-defined.  I took a fairly conservative approach to avoid lots of bogus 
complaints about chunks that are of unexpected size.   Not all such complaints 
are removed, but enough are removed that I needed to add a final complaint at 
the end about the total size seen not matching the total size expected.

Here are a set of corruptions with the corresponding corruption reports from 
before and from after the code changes.  The corruptions are *not* cumulative.

Honestly, I'm not totally convinced that these changes are improvements in all 
cases.  Let me know if you want further changes, or if you'd like to see other 
corruptions and their before and after results.


Corruption #1:

UPDATE $toastname SET chunk_seq = chunk_seq + 1000

Before:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 0 has sequence number 1000, but expected sequence 
number 0
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 1 has sequence number 1001, but expected sequence 
number 1
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 2 has sequence number 1002, but expected sequence 
number 2
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 3 has sequence number 1003, but expected sequence 
number 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 4 has sequence number 1004, but expected sequence 
number 4
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 5 has sequence number 1005, but expected sequence 
number 5

After:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 0 through 999


Corruption #2:

UPDATE $toastname SET chunk_seq = chunk_seq * 1000

Before:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 1 has sequence number 1000, but expected sequence 
number 1
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 2 has sequence number 2000, but expected sequence 
number 2
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 3 has sequence number 3000, but expected sequence 
number 3
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 4 has sequence number 4000, but expected sequence 
number 4
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 chunk 5 has sequence number 5000, but expected sequence 
number 5

After:

# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 1 through 999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 1001 through 1999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 2001 through 2999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 3001 through 3999
# heap table "postgres"."public"."test", block 0, offset 2, attribute 2:
# toast value 16445 missing chunks 4001 through 4999
# 

Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 6:51 PM Mark Dilger  wrote:
> #2 is if chunk_seq goes up but skips numbers.  #3 is if chunk_seq ever goes 
> down, meaning the index scan did something unexpected.

Yeah, sure. But I think we could probably treat those the same way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger



> On Apr 8, 2021, at 3:11 PM, Robert Haas  wrote:
> 
> On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger  
> wrote:
>> All this leads me to believe that we should report the following:
>> 
>> 1) If the total number of chunks retrieved differs from the expected number, 
>> report how many we expected vs. how many we got
>> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
>> 3) If the index scan returned chunks out of chunk_seq order, report that
>> 4) If any chunk is not the expected size, report that
>> 
>> So, for your example of chunk 1 missing from chunks [0..17], we'd report 
>> that we got one fewer chunks than we expected, that the second chunk seen 
>> was discontiguous from the first chunk seen, that the final chunk seen was 
>> smaller than expected by M bytes, and that the total size was smaller than 
>> we expected by N bytes.  The third of those is somewhat misleading, since 
>> the final chunk was presumably the right size; we just weren't expecting to 
>> hit a partial chunk quite yet.  But I don't see how to make that better in 
>> the general case.
> 
> Hmm, that might be OK. It seems like it's going to be a bit verbose in
> simple cases like 1 missing chunk, but on the plus side, it avoids a
> mountain of output if the raw size has been overwritten with a
> gigantic bogus value. But, how is #2 different from #3? Those sound
> like the same thing to me.

#2 is if chunk_seq goes up but skips numbers.  #3 is if chunk_seq ever goes 
down, meaning the index scan did something unexpected.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 5:21 PM Mark Dilger  wrote:
> All this leads me to believe that we should report the following:
>
> 1) If the total number of chunks retrieved differs from the expected number, 
> report how many we expected vs. how many we got
> 2) If the chunk_seq numbers are discontiguous, report each discontiguity.
> 3) If the index scan returned chunks out of chunk_seq order, report that
> 4) If any chunk is not the expected size, report that
>
> So, for your example of chunk 1 missing from chunks [0..17], we'd report that 
> we got one fewer chunks than we expected, that the second chunk seen was 
> discontiguous from the first chunk seen, that the final chunk seen was 
> smaller than expected by M bytes, and that the total size was smaller than we 
> expected by N bytes.  The third of those is somewhat misleading, since the 
> final chunk was presumably the right size; we just weren't expecting to hit a 
> partial chunk quite yet.  But I don't see how to make that better in the 
> general case.

Hmm, that might be OK. It seems like it's going to be a bit verbose in
simple cases like 1 missing chunk, but on the plus side, it avoids a
mountain of output if the raw size has been overwritten with a
gigantic bogus value. But, how is #2 different from #3? Those sound
like the same thing to me.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger



> On Apr 8, 2021, at 1:05 PM, Robert Haas  wrote:
> 
> On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger  
> wrote:
>> Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update 
>> the toast to have only 6 chunks numbered [0..5] except we corruptly keep 
>> chunks numbered [12..17] in the toast table.  We'd rather see a report like 
>> this:
> [ toast value NNN chunk NNN has sequence number NNN, but expected
> sequence number NNN ]
>> than one like this:
> [ toast value NNN contains chunk NNN where chunk NNN was expected ]
>> 
>> because saying the toast value ended at "chunk 12" after saying that it 
>> contains "chunk 17" is contradictory.  You need the distinction between the 
>> chunk number and the chunk sequence number, since in corrupt circumstances 
>> they may not be the same.
> 
> Hmm, I see your point, and that's a good example to illustrate it.
> But, with that example in front of me, I am rather doubtful that
> either of these is what users actually want. Consider the case where I
> should have chunks 0..17 and chunk 1 is just plain gone. This, by the
> way, seems like a pretty likely case to arise in practice, since all
> we need is for a block to get truncated away or zeroed erroneously, or
> for a tuple to get pruned that shouldn't. With either of the above
> schemes, I guess we're going to get a message about every chunk from 2
> to 17, complaining that they're all misnumbered. We might also get a
> complaint that the last chunk is the wrong size, and that the total
> number of chunks isn't right. What we really want is a single
> complaint saying chunk 1 is missing.
> 
> Likewise, in your example, I sort of feel like what I really want,
> rather than either of the above outputs, is to get some messages like
> this:
> 
> toast value NNN contains unexpected extra chunk [12-17]
> 
> Both your phrasing for those messages and what I suggested make it
> sound like the problem is that the chunk number is wrong. But that
> doesn't seem like it's taking the right view of the situation. Chunks
> 12-17 shouldn't exist at all, and if they do, we should say that, e.g.
> by complaining about something like "toast value 16444 chunk 12
> follows last expected chunk 5"
> 
> In other words, I don't buy the idea that the user will accept the
> idea that there's a chunk number and a chunk sequence number, and that
> they should know the difference between those things and what each of
> them are. They're entitled to imagine that there's just one thing, and
> that we're going to tell them about value that are extra or missing.
> The fact that we're not doing that seems like it's just a matter of
> missing code.

Somehow, we have to get enough information about chunk_seq discontinuity into 
the output that if the user forwards it to -hackers, or if the output comes 
from a buildfarm critter, that we have all the information to help diagnose 
what went wrong.

As a specific example, if the va_rawsize suggests 2 chunks, and we find 150 
chunks all with contiguous chunk_seq values, that is different from a debugging 
point of view than if we find 150 chunks with chunk_seq values spread all over 
the [0..MAXINT] range.  We can't just tell the user that there were 148 extra 
chunks.  We also shouldn't phrase the error in terms of "extra chunks", since 
it might be the va_rawsize that is corrupt.

I agree that the current message output might be overly verbose in how it 
reports this information.  Conceptually, we want to store up information about 
the chunk issues and report them all at once, but that's hard to do in general, 
as the number of chunk_seq discontinuities might be quite large, much too large 
to fit reasonably into any one message.  Maybe we could report just the first N 
discontinuities rather than all of them, but if somebody wants to open a hex 
editor and walk through the toast table, they won't appreciate having the 
corruption information truncated like that.

All this leads me to believe that we should report the following:

1) If the total number of chunks retrieved differs from the expected number, 
report how many we expected vs. how many we got
2) If the chunk_seq numbers are discontiguous, report each discontiguity.
3) If the index scan returned chunks out of chunk_seq order, report that
4) If any chunk is not the expected size, report that

So, for your example of chunk 1 missing from chunks [0..17], we'd report that 
we got one fewer chunks than we expected, that the second chunk seen was 
discontiguous from the first chunk seen, that the final chunk seen was smaller 
than expected by M bytes, and that the total size was smaller than we expected 
by N bytes.  The third of those is somewhat misleading, since the final chunk 
was presumably the right size; we just weren't expecting to hit a partial chunk 
quite yet.  But I don't see how to make that better in the general case.

> If we start the index scan and get chunk 4, we can
> easily emit messages for chunks 0..3 

Re: pg_amcheck contrib application

2021-04-08 Thread Robert Haas
On Thu, Apr 8, 2021 at 3:02 PM Mark Dilger  wrote:
> Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update 
> the toast to have only 6 chunks numbered [0..5] except we corruptly keep 
> chunks numbered [12..17] in the toast table.  We'd rather see a report like 
> this:
[ toast value NNN chunk NNN has sequence number NNN, but expected
sequence number NNN ]
> than one like this:
[ toast value NNN contains chunk NNN where chunk NNN was expected ]
>
> because saying the toast value ended at "chunk 12" after saying that it 
> contains "chunk 17" is contradictory.  You need the distinction between the 
> chunk number and the chunk sequence number, since in corrupt circumstances 
> they may not be the same.

Hmm, I see your point, and that's a good example to illustrate it.
But, with that example in front of me, I am rather doubtful that
either of these is what users actually want. Consider the case where I
should have chunks 0..17 and chunk 1 is just plain gone. This, by the
way, seems like a pretty likely case to arise in practice, since all
we need is for a block to get truncated away or zeroed erroneously, or
for a tuple to get pruned that shouldn't. With either of the above
schemes, I guess we're going to get a message about every chunk from 2
to 17, complaining that they're all misnumbered. We might also get a
complaint that the last chunk is the wrong size, and that the total
number of chunks isn't right. What we really want is a single
complaint saying chunk 1 is missing.

Likewise, in your example, I sort of feel like what I really want,
rather than either of the above outputs, is to get some messages like
this:

toast value NNN contains unexpected extra chunk [12-17]

Both your phrasing for those messages and what I suggested make it
sound like the problem is that the chunk number is wrong. But that
doesn't seem like it's taking the right view of the situation. Chunks
12-17 shouldn't exist at all, and if they do, we should say that, e.g.
by complaining about something like "toast value 16444 chunk 12
follows last expected chunk 5"

In other words, I don't buy the idea that the user will accept the
idea that there's a chunk number and a chunk sequence number, and that
they should know the difference between those things and what each of
them are. They're entitled to imagine that there's just one thing, and
that we're going to tell them about value that are extra or missing.
The fact that we're not doing that seems like it's just a matter of
missing code. If we start the index scan and get chunk 4, we can
easily emit messages for chunks 0..3 right on the spot, declaring them
missing. Things do get a bit hairy if the index scan returns values
out of order: what if it gives us chunk_seq = 2 and then chunk_seq =
1? But I think we could handle that by just issuing a complaint in any
such case that "toast index returns chunks out of order for toast
value NNN" and stopping further checking of that toast value.

> Purely as manual testing, and not part of the patch, I hacked the backend a 
> bit to allow direct modification of the toast table.  After corrupting the 
> toast with the following bit of SQL:
>
> WITH chunk_limit AS (
> SELECT chunk_id, MAX(chunk_seq) AS maxseq
> FROM $toastname
> GROUP BY chunk_id)
> INSERT INTO $toastname (chunk_id, chunk_seq, chunk_data)
> (SELECT t.chunk_id,
> t.chunk_seq + cl.maxseq + CASE WHEN 
> t.chunk_seq < 3 THEN 1 ELSE 7 END,
> t.chunk_data
> FROM $toastname t
> INNER JOIN chunk_limit cl
> ON t.chunk_id = cl.chunk_id)
>
> pg_amcheck reports the following corruption messages:
>
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 6 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 7 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 8 follows last expected chunk 5
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 9 has sequence number 15, but expected sequence 
> number 9
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 10 has sequence number 16, but expected 
> sequence number 10
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 chunk 11 has sequence number 17, but expected 
> sequence number 11
> # heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
> # toast value 16444 was expected to end at chunk 6, but ended at chunk 12
>
> I think if we'd left out the 

Re: pg_amcheck contrib application

2021-04-08 Thread Mark Dilger


> On Apr 7, 2021, at 1:16 PM, Robert Haas  wrote:
> 
> On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger  
> wrote:
>> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked 
>> owing to how I had separated the changes in v17-0002 vs. v17-0003
> 
> Committed.

Thank you.

>> v18-0002 - Postpones the toast checks for a page until after the main table 
>> page lock is released
> 
> Committed, but I changed list_free() to list_free_deep() to avoid a
> memory leak, and I revised the commit message to mention the important
> point that we need to avoid following TOAST pointers from
> potentially-prunable tuples.

Thank you, and yes, I agree with that change.

>> v18-0003 - Improves the corruption messages in ways already discussed 
>> earlier in this thread.  Changes the tests to expect the new messages, but 
>> adds no new checks
> 
> Kibitizing your message wording:
> 
> "toast value %u chunk data is null" -> "toast value %u chunk %d has
> null data". We can mention the chunk number this way.

Changed.

> "toast value %u corrupt extended chunk has invalid varlena header: %0x
> (sequence number %d)" -> "toast value %u chunk %d has invalid varlena
> header %0x". We can be more consistent about how we incorporate the
> chunk number into the text, and we don't really need to include the
> word corrupt, because all of these are corruption complaints, and I
> think it looks better without the colon.

Changed.

> "toast value %u chunk sequence number %u does not match the expected
> sequence number %u" -> "toast value %u contains chunk %d where chunk
> %d was expected". Shorter. Uses %d for a sequence number instead of
> %u, which I think is correct -- anyway we should have them all one way
> or all the other. I think I'd rather ditch the "sequence number"
> technology and just talk about "chunk %d" or whatever.

I don't agree with this one.  I do agree with changing the message, but not to 
the message you suggest.

Imagine a toasted attribute with 18 chunks numbered [0..17].  Then we update 
the toast to have only 6 chunks numbered [0..5] except we corruptly keep chunks 
numbered [12..17] in the toast table.  We'd rather see a report like this:

# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 6 has sequence number 12, but expected sequence 
number 6
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 7 has sequence number 13, but expected sequence 
number 7
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 8 has sequence number 14, but expected sequence 
number 8
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 9 has sequence number 15, but expected sequence 
number 9
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 10 has sequence number 16, but expected sequence 
number 10
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 chunk 11 has sequence number 17, but expected sequence 
number 11
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 was expected to end at chunk 6, but ended at chunk 12

than one like this:

# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 12 where chunk 6 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 13 where chunk 7 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 14 where chunk 8 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 15 where chunk 9 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 16 where chunk 10 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 contains chunk 17 where chunk 11 was expected
# heap table "postgres"."public"."test", block 0, offset 1, attribute 2:
# toast value 16444 was expected to end at chunk 6, but ended at chunk 12

because saying the toast value ended at "chunk 12" after saying that it 
contains "chunk 17" is contradictory.  You need the distinction between the 
chunk number and the chunk sequence number, since in corrupt circumstances they 
may not be the same.

> "toast value %u chunk sequence number %u exceeds the end chunk
> sequence number %u" -> "toast value %u chunk %d follows last expected
> chunk %d"

Changed.

> "toast value %u chunk size %u differs from the expected size %u" ->
> "toast value %u chunk %d has size %u, but expected size %u"

Changed.

> Other complaints:
> 
> Your commit message fails to mention the addition of
> 

Re: pg_amcheck contrib application

2021-04-07 Thread Robert Haas
On Sun, Apr 4, 2021 at 8:02 PM Mark Dilger  wrote:
> v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked 
> owing to how I had separated the changes in v17-0002 vs. v17-0003

Committed.

> v18-0002 - Postpones the toast checks for a page until after the main table 
> page lock is released

Committed, but I changed list_free() to list_free_deep() to avoid a
memory leak, and I revised the commit message to mention the important
point that we need to avoid following TOAST pointers from
potentially-prunable tuples.

> v18-0003 - Improves the corruption messages in ways already discussed earlier 
> in this thread.  Changes the tests to expect the new messages, but adds no 
> new checks

Kibitizing your message wording:

"toast value %u chunk data is null" -> "toast value %u chunk %d has
null data". We can mention the chunk number this way.

"toast value %u corrupt extended chunk has invalid varlena header: %0x
(sequence number %d)" -> "toast value %u chunk %d has invalid varlena
header %0x". We can be more consistent about how we incorporate the
chunk number into the text, and we don't really need to include the
word corrupt, because all of these are corruption complaints, and I
think it looks better without the colon.

"toast value %u chunk sequence number %u does not match the expected
sequence number %u" -> "toast value %u contains chunk %d where chunk
%d was expected". Shorter. Uses %d for a sequence number instead of
%u, which I think is correct -- anyway we should have them all one way
or all the other. I think I'd rather ditch the "sequence number"
technology and just talk about "chunk %d" or whatever.

"toast value %u chunk sequence number %u exceeds the end chunk
sequence number %u" -> "toast value %u chunk %d follows last expected
chunk %d"

"toast value %u chunk size %u differs from the expected size %u" ->
"toast value %u chunk %d has size %u, but expected size %u"

Other complaints:

Your commit message fails to mention the addition of
VARATT_EXTERNAL_GET_POINTER, which is a significant change/bug fix
unrelated to message wording.

It feels like we have a non-minimal number of checks/messages for the
series of toast chunks. I think that if we find a chunk after the last
chunk we were expecting to find (curchunk > endchunk) and you also get
a message if we have the wrong number of chunks in total (chunkno !=
(endchunk + 1)). Now maybe I'm wrong, but if the first message
triggers, it seems like the second message must also trigger. Is that
wrong? If not, maybe we can get rid of the first one entirely? That's
such a small change I think we could include it in this same patch, if
it's a correct idea.

On a related note, as I think I said before, I still think we should
be rejiggering this so that we're not testing both the size of each
individual chunk and the total size, because that ought to be
redundant. That might be better done as a separate patch but I think
we should try to clean it up.

> v18-0004 - Adding corruption checks of toast pointers.  Extends the 
> regression tests to cover the new checks.

I think we could check that the result of
VARATT_EXTERNAL_GET_COMPRESS_METHOD is one of the values we expect to
see.

Using AllocSizeIsValid() seems pretty vile. I know that MaxAllocSize
is 0x3FFF in no small part because that's the maximum length that
can be represented by a varlena, but I'm not sure it's a good idea to
couple the concepts so closely like this. Maybe we can just #define
VARLENA_SIZE_LIMIT in this file and use that, and a message that says
size %u exceeds limit %u.

I'm a little worried about whether the additional test cases are
Endian-dependent at all. I don't immediately know what might be wrong
with them, but I'm going to think about that some more later. Any
chance you have access to a Big-endian box where you can test this?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-04 Thread Mark Dilger


> On Apr 1, 2021, at 1:08 PM, Robert Haas  wrote:
> 
> 
> 
> - There are a total of two (2) calls in the current source code to
> palloc0fast, and hundreds of calls to palloc0. So I think you should
> forget about using the fast variant and just do what almost every
> other caller does.

Done.

> - If you want to make this code faster, a better idea would be to
> avoid doing all of this allocate and free work and just allocate an
> array that's guaranteed to be big enough, and then keep track of how
> many elements of that array are actually in use.

Sounds like premature optimization to me.  I only used palloc0fast because the 
argument is compile-time known.  I wasn't specifically attempting to speed 
anything up.

> - #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
> introducing such a feature. Either we do it for real and expose it via
> SQL and pg_amcheck as an optional behavior, or we rip it out and
> revisit it later. Given the nearness of feature freeze, my vote is for
> the latter.
> 
> - I'd remove the USE_LZ4 bit, too. Let's not define the presence of
> LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
> people will expect to be able to use this to find places where they
> are dependent on LZ4 if they want to move away from it -- and if we
> don't recurse into composite datums, that will not actually work.

Ok, I have removed this bit.  I also removed the part of the patch that 
introduced a new corruption check, decompressing the data to see if it 
decompresses without error.

> - check_toast_tuple() has an odd and slightly unclear calling
> convention for which there are no comments. I wonder if it would be
> better to reverse things and make bool *error the return value and
> what is now the return value into a pointer argument, but whatever we
> do I think it needs a few words in the comments. We don't need to
> slavishly explain every argument -- I think toasttup and ctx and tctx
> are reasonably clear -- but this is not.
...
> - Is there a reason we need a cross-check on both the number of chunks
> and on the total size? It seems to me that we should check that each
> individual chunk has the size we expect, and that the total number of
> chunks is what we expect. The overall size is then necessarily
> correct.

Good point.  I've removed the extra check on the total size, since it cannot be 
wrong if the checks on the individual chunk sizes were all correct.  This 
eliminates the need for the odd calling convention for check_toast_tuple(), so 
I've changed that to return void and not take any return-by-reference arguments.

> - To me it would be more logical to reverse the order of the
> toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
> VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
> - VARHDRSZ checks. Whether we're pointing at the correct relation
> feels more fundamental.

Done.

> - If we moved the toplevel foreach loop in check_toasted_attributes()
> out to the caller, say renaming the function to just
> check_toasted_attribute(), we'd save a level of indentation in that
> whole function and probably add a tad of clarity, too. You wouldn't
> feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
> if the caller had just done list_free(ctx->toasted_attributes);
> ctx->toasted_attributes = NIL.

You're right.  It looks nicer that way.  Changed.

> - Why are all the changes to the tests in this patch? What do they
> have to do with getting the TOAST checks out from under the buffer
> lock? I really need you to structure the patch series so that each
> patch is about one topic and, equally, so that each topic is only
> covered by one patch. Otherwise it's just way too confusing.

v18-0001 - Finishes work started in commit 3b6c1259f9 that was overlooked owing 
to how I had separated the changes in v17-0002 vs. v17-0003

v18-0002 - Postpones the toast checks for a page until after the main table 
page lock is released

v18-0003 - Improves the corruption messages in ways already discussed earlier 
in this thread.  Changes the tests to expect the new messages, but adds no new 
checks

v18-0004 - Adding corruption checks of toast pointers.  Extends the regression 
tests to cover the new checks.

> 
> - I think some of these messages need a bit of word-smithing, too, but
> we can leave that for when we're closer to being done with this.

Ok.




v18-0001-amcheck-remove-duplicate-xid-bounds-checks.patch
Description: Binary data


v18-0002-amcheck-avoid-extra-work-while-holding-buffer-lo.patch
Description: Binary data


v18-0003-amcheck-improving-corruption-messages.patch
Description: Binary data


v18-0004-amcheck-adding-toast-pointer-corruption-checks.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:41 PM Robert Haas  wrote:
> OK, committed. We still need to deal with what you had as 0003
> upthread, so I guess the next step is for me to spend some time
> reviewing that one.

I did this, and it was a bit depressing. It appears that we now have
duplicate checks for xmin and xmax being out of the valid range.
Somehow you have the removal of those duplicate checks in 0003, but
why in the world didn't you put them into one of the previous patches
and just make 0003 about fixing the
holding-buffer-lock-while-following-TOAST-pointers problem? (And, gah,
why did I not look carefully enough to notice that you hadn't done
that?)

Other than that, I notice a few other things:

- There are a total of two (2) calls in the current source code to
palloc0fast, and hundreds of calls to palloc0. So I think you should
forget about using the fast variant and just do what almost every
other caller does.

- If you want to make this code faster, a better idea would be to
avoid doing all of this allocate and free work and just allocate an
array that's guaranteed to be big enough, and then keep track of how
many elements of that array are actually in use.

- #ifdef DECOMPRESSION_CORRUPTION_CHECKING is not a useful way of
introducing such a feature. Either we do it for real and expose it via
SQL and pg_amcheck as an optional behavior, or we rip it out and
revisit it later. Given the nearness of feature freeze, my vote is for
the latter.

- I'd remove the USE_LZ4 bit, too. Let's not define the presence of
LZ4 data in a non-LZ4-enabled cluster as corruption. If we do, then
people will expect to be able to use this to find places where they
are dependent on LZ4 if they want to move away from it -- and if we
don't recurse into composite datums, that will not actually work.

- check_toast_tuple() has an odd and slightly unclear calling
convention for which there are no comments. I wonder if it would be
better to reverse things and make bool *error the return value and
what is now the return value into a pointer argument, but whatever we
do I think it needs a few words in the comments. We don't need to
slavishly explain every argument -- I think toasttup and ctx and tctx
are reasonably clear -- but this is not.

- To me it would be more logical to reverse the order of the
toast_pointer.va_toastrelid != ctx->rel->rd_rel->reltoastrelid and
VARATT_EXTERNAL_GET_EXTSIZE(toast_pointer) > toast_pointer.va_rawsize
- VARHDRSZ checks. Whether we're pointing at the correct relation
feels more fundamental.

- If we moved the toplevel foreach loop in check_toasted_attributes()
out to the caller, say renaming the function to just
check_toasted_attribute(), we'd save a level of indentation in that
whole function and probably add a tad of clarity, too. You wouldn't
feel the need to Assert(ctx.toasted_attributes == NIL) in the caller
if the caller had just done list_free(ctx->toasted_attributes);
ctx->toasted_attributes = NIL.

- Is there a reason we need a cross-check on both the number of chunks
and on the total size? It seems to me that we should check that each
individual chunk has the size we expect, and that the total number of
chunks is what we expect. The overall size is then necessarily
correct.

- Why are all the changes to the tests in this patch? What do they
have to do with getting the TOAST checks out from under the buffer
lock? I really need you to structure the patch series so that each
patch is about one topic and, equally, so that each topic is only
covered by one patch. Otherwise it's just way too confusing.

- I think some of these messages need a bit of word-smithing, too, but
we can leave that for when we're closer to being done with this.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:24 PM Mark Dilger  wrote:
> > On Apr 1, 2021, at 10:20 AM, Robert Haas  wrote:
> > OK, let's try that again.
> Looks good!

OK, committed. We still need to deal with what you had as 0003
upthread, so I guess the next step is for me to spend some time
reviewing that one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger



> On Apr 1, 2021, at 10:20 AM, Robert Haas  wrote:
> 
> On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger  
> wrote:
>> Seems fine other than the typo.
> 
> OK, let's try that again.

Looks good!

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 1:06 PM Mark Dilger  wrote:
> Seems fine other than the typo.

OK, let's try that again.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v17-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger



> On Apr 1, 2021, at 9:56 AM, Robert Haas  wrote:
> 
> On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
>  wrote:
>>> - If xmax is a multi but seems to be garbled, I changed it to return
>>> true rather than false. The inserter is known to have committed by
>>> that point, so I think it's OK to try to deform the tuple. We just
>>> shouldn't try to check TOAST.
>> 
>> It is hard to know what to do when at least one tuple header field is 
>> corrupt.  You don't necesarily know which one it is.  For example, if 
>> HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it 
>> is out of bounds, we report it as corrupt.  But was the xmax corrupt?  Or 
>> was the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took the view 
>> that if either xmin or xmax appear to be corrupt when interpreted in light 
>> of the various tuple header bits, all we really know is that the set of 
>> fields/bits don't make sense as a whole, so we report corruption, don't 
>> trust any of them, and abort further checking of the tuple.  You have be 
>> burden of proof the other way around.  If the xmin appears fine, and xmax 
>> appears corrupt, then we only know that xmax is corrupt, so the tuple is 
>> checkable because according to the xmin it committed.
> 
> I agree that it's hard to be sure what's gone once we start finding
> corrupted data, but deciding that maybe xmin didn't really commit
> because we see that there's something wrong with xmax seems excessive
> to me. I thought about a related case: if xmax is a bad multi but is
> also hinted invalid, should we try to follow TOAST pointers? I think
> that's hard to say, because we don't know whether (1) the invalid
> marking is in error, (2) it's wrong to consider it a multi rather than
> an XID, (3) the stored multi got overwritten with a garbage value, or
> (4) the stored multi got removed before the tuple was frozen. Not
> knowing which of those is the case, how are we supposed to decide
> whether the TOAST tuples might have been (or be about to get) pruned?
> 
> But, in the case we're talking about here, I don't think it's a
> particularly close decision. All we need to say is that if xmax or the
> infomask bits pertaining to it are corrupted, we're still going to
> suppose that xmin and the infomask bits pertaining to it, which are
> all different bytes and bits, are OK. To me, the contrary decision,
> namely that a bogus xmax means xmin was probably lying about the
> transaction having been committed in the first place, seems like a
> serious overreaction. As you say:
> 
>> I don't think how you have it causes undue problems, since deforming the 
>> tuple when you shouldn't merely risks a bunch of extra not-so-helpful 
>> corruption messages.  And hey, maybe they're helpful to somebody clever 
>> enough to diagnose why that particular bit of noise was generated.
> 
> I agree. The biggest risk here is that we might omit >0 complaints
> when only 0 are justified. That will panic users. The possibility that
> we might omit >x complaints when only x are justified, for some x>0,
> is also a risk, but it's not nearly as bad, because there's definitely
> something wrong, and it's just a question of what it is exactly. So we
> have to be really conservative about saying that X is corruption if
> there's any possibility that it might be fine. But once we've
> complained about one thing, we can take a more balanced approach about
> whether to risk issuing more complaints. The possibility that
> suppressing the additional complaints might complicate resolution of
> the issue also needs to be considered.

This all seems fine to me.  The main thing is that we don't go on to check the 
toast, which we don't.

>>* If xmin_status happens to be XID_IN_PROGRESS, then in theory
>> 
>> Did you mean to say XID_IS_CURRENT_XID here?
> 
> Yes, I did, thanks.

Ouch.  You've got a typo:  s/XID_IN_CURRENT_XID/XID_IS_CURRENT_XID/

>> /* xmax is an MXID, not an MXID. Sanity check it. */
>> 
>> Is it an MXID or isn't it?
> 
> Good catch.
> 
> New patch attached.

Seems fine other than the typo.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Thu, Apr 1, 2021 at 12:32 PM Mark Dilger
 wrote:
> > - If xmax is a multi but seems to be garbled, I changed it to return
> > true rather than false. The inserter is known to have committed by
> > that point, so I think it's OK to try to deform the tuple. We just
> > shouldn't try to check TOAST.
>
> It is hard to know what to do when at least one tuple header field is 
> corrupt.  You don't necesarily know which one it is.  For example, if 
> HEAP_XMAX_IS_MULTI is set, we try to interpret the xmax as a mxid, and if it 
> is out of bounds, we report it as corrupt.  But was the xmax corrupt?  Or was 
> the HEAP_XMAX_IS_MULTI bit corrupt?  It's not clear.  I took the view that if 
> either xmin or xmax appear to be corrupt when interpreted in light of the 
> various tuple header bits, all we really know is that the set of fields/bits 
> don't make sense as a whole, so we report corruption, don't trust any of 
> them, and abort further checking of the tuple.  You have be burden of proof 
> the other way around.  If the xmin appears fine, and xmax appears corrupt, 
> then we only know that xmax is corrupt, so the tuple is checkable because 
> according to the xmin it committed.

I agree that it's hard to be sure what's gone once we start finding
corrupted data, but deciding that maybe xmin didn't really commit
because we see that there's something wrong with xmax seems excessive
to me. I thought about a related case: if xmax is a bad multi but is
also hinted invalid, should we try to follow TOAST pointers? I think
that's hard to say, because we don't know whether (1) the invalid
marking is in error, (2) it's wrong to consider it a multi rather than
an XID, (3) the stored multi got overwritten with a garbage value, or
(4) the stored multi got removed before the tuple was frozen. Not
knowing which of those is the case, how are we supposed to decide
whether the TOAST tuples might have been (or be about to get) pruned?

But, in the case we're talking about here, I don't think it's a
particularly close decision. All we need to say is that if xmax or the
infomask bits pertaining to it are corrupted, we're still going to
suppose that xmin and the infomask bits pertaining to it, which are
all different bytes and bits, are OK. To me, the contrary decision,
namely that a bogus xmax means xmin was probably lying about the
transaction having been committed in the first place, seems like a
serious overreaction. As you say:

> I don't think how you have it causes undue problems, since deforming the 
> tuple when you shouldn't merely risks a bunch of extra not-so-helpful 
> corruption messages.  And hey, maybe they're helpful to somebody clever 
> enough to diagnose why that particular bit of noise was generated.

I agree. The biggest risk here is that we might omit >0 complaints
when only 0 are justified. That will panic users. The possibility that
we might omit >x complaints when only x are justified, for some x>0,
is also a risk, but it's not nearly as bad, because there's definitely
something wrong, and it's just a question of what it is exactly. So we
have to be really conservative about saying that X is corruption if
there's any possibility that it might be fine. But once we've
complained about one thing, we can take a more balanced approach about
whether to risk issuing more complaints. The possibility that
suppressing the additional complaints might complicate resolution of
the issue also needs to be considered.

> * If xmin_status happens to be XID_IN_PROGRESS, then in theory
>
> Did you mean to say XID_IS_CURRENT_XID here?

Yes, I did, thanks.

> /* xmax is an MXID, not an MXID. Sanity check it. */
>
> Is it an MXID or isn't it?

Good catch.

New patch attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v16-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-04-01 Thread Mark Dilger



> On Apr 1, 2021, at 8:08 AM, Robert Haas  wrote:
> 
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
>  wrote:
>> These changes got lost between v11 and v12.  I've put them back, as well as 
>> updating to use your language.
> 
> Here's an updated patch that includes your 0001 and 0002 plus a bunch
> of changes by me. I intend to commit this version unless somebody
> spots a problem with it.
> 
> Here are the things I changed:
> 
> - I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
> think it does a better job suggesting that we're uncertain about what
> will happen.

+1

> - I got rid of bool header_garbled and changed it to bool result,
> inverting the sense, because it didn't seem useful to have a function
> that ended with if (some_boolean) return false; return true when I
> could end the function with return some_other_boolean.

+1

> - I removed all the one-word comments that said /* corrupt */ or /*
> checkable */ because they seemed redundant.

Ok.

> - In the xmin_status section of check_tuple_visibility(), I got rid of
> the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
> XID_IN_PROGRESS cases because they were redundant with the xmin_status
> != XID_COMMITTED case.

Ok.

> - If xmax is a multi but seems to be garbled, I changed it to return
> true rather than false. The inserter is known to have committed by
> that point, so I think it's OK to try to deform the tuple. We just
> shouldn't try to check TOAST.

It is hard to know what to do when at least one tuple header field is corrupt.  
You don't necesarily know which one it is.  For example, if HEAP_XMAX_IS_MULTI 
is set, we try to interpret the xmax as a mxid, and if it is out of bounds, we 
report it as corrupt.  But was the xmax corrupt?  Or was the HEAP_XMAX_IS_MULTI 
bit corrupt?  It's not clear.  I took the view that if either xmin or xmax 
appear to be corrupt when interpreted in light of the various tuple header 
bits, all we really know is that the set of fields/bits don't make sense as a 
whole, so we report corruption, don't trust any of them, and abort further 
checking of the tuple.  You have be burden of proof the other way around.  If 
the xmin appears fine, and xmax appears corrupt, then we only know that xmax is 
corrupt, so the tuple is checkable because according to the xmin it committed.

I don't think how you have it causes undue problems, since deforming the tuple 
when you shouldn't merely risks a bunch of extra not-so-helpful corruption 
messages.  And hey, maybe they're helpful to somebody clever enough to diagnose 
why that particular bit of noise was generated. 

> - I changed both switches over xmax_status to break in each case and
> then return true after instead of returning true for each case. I
> think this is clearer.

Ok.

> - I changed get_xid_status() to perform the TransactionIdIs... checks
> in the same order that HeapTupleSatisfies...() does them. I believe
> that it's incorrect to conclude that the transaction must be in
> progress because it neither IsCurrentTransaction nor DidCommit nor
> DidAbort, because all of those things will be false for a transaction
> that is running at the time of a system crash. The correct rule is
> that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
> then it's aborted.

Ok.

> - I moved a few comments and rewrote some others, including some of
> the ones that you took from my earlier draft patch. The thing is, the
> comment needs to be adjusted based on where you put it. Like, I had a
> comment that says"It should be impossible for xvac to still be
> running, since we've removed all that code, but even if it were, it
> ought to be safe to read the tuple, since the original inserter must
> have committed.  But, if the xvac transaction committed, this tuple
> (and its associated TOAST tuples) could be pruned at any time." which
> in my version was right before a TransactionIdDidCommit() test, and
> explains why that test is there and why the code does what it does as
> a result. But in your version you've moved it to a place where we've
> already tested that the transaction has committed, and more
> importantly, where we've already tested that it's not still running.
> Saying that it "should" be impossible for it not to be running when
> we've *actually checked that* doesn't make nearly as much sense as it
> does when we haven't checked that and aren't going to do so.


* If xmin_status happens to be XID_IN_PROGRESS, then in theory  
   
 
Did you mean to say XID_IS_CURRENT_XID here? 

/* xmax is an MXID, not an MXID. Sanity check it. */ 

Is it an MXID or isn't it?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-04-01 Thread Robert Haas
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
 wrote:
> These changes got lost between v11 and v12.  I've put them back, as well as 
> updating to use your language.

Here's an updated patch that includes your 0001 and 0002 plus a bunch
of changes by me. I intend to commit this version unless somebody
spots a problem with it.

Here are the things I changed:

- I renamed tuple_can_be_pruned to tuple_could_be_pruned because I
think it does a better job suggesting that we're uncertain about what
will happen.

- I got rid of bool header_garbled and changed it to bool result,
inverting the sense, because it didn't seem useful to have a function
that ended with if (some_boolean) return false; return true when I
could end the function with return some_other_boolean.

- I removed all the one-word comments that said /* corrupt */ or /*
checkable */ because they seemed redundant.

- In the xmin_status section of check_tuple_visibility(), I got rid of
the xmin_status == XID_IS_CURRENT_XID and xmin_status ==
XID_IN_PROGRESS cases because they were redundant with the xmin_status
!= XID_COMMITTED case.

- If xmax is a multi but seems to be garbled, I changed it to return
true rather than false. The inserter is known to have committed by
that point, so I think it's OK to try to deform the tuple. We just
shouldn't try to check TOAST.

- I changed both switches over xmax_status to break in each case and
then return true after instead of returning true for each case. I
think this is clearer.

- I changed get_xid_status() to perform the TransactionIdIs... checks
in the same order that HeapTupleSatisfies...() does them. I believe
that it's incorrect to conclude that the transaction must be in
progress because it neither IsCurrentTransaction nor DidCommit nor
DidAbort, because all of those things will be false for a transaction
that is running at the time of a system crash. The correct rule is
that if it neither IsCurrentTransaction nor IsInProgress nor DidCommit
then it's aborted.

- I moved a few comments and rewrote some others, including some of
the ones that you took from my earlier draft patch. The thing is, the
comment needs to be adjusted based on where you put it. Like, I had a
comment that says"It should be impossible for xvac to still be
running, since we've removed all that code, but even if it were, it
ought to be safe to read the tuple, since the original inserter must
have committed.  But, if the xvac transaction committed, this tuple
(and its associated TOAST tuples) could be pruned at any time." which
in my version was right before a TransactionIdDidCommit() test, and
explains why that test is there and why the code does what it does as
a result. But in your version you've moved it to a place where we've
already tested that the transaction has committed, and more
importantly, where we've already tested that it's not still running.
Saying that it "should" be impossible for it not to be running when
we've *actually checked that* doesn't make nearly as much sense as it
does when we haven't checked that and aren't going to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


v15-0001-amcheck-Fix-verify_heapam-s-tuple-visibility-che.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 1:44 PM Mark Dilger
 wrote:
> I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code 
> shows that it takes an AccessExclusiveLock.  I think the docs are pretty 
> misleading here, though I understand that grammatically it is hard to say 
> "accessively exclusively locks" or such.  But a part of my analysis was based 
> on the reasoning that if VF only takes an ExclusiveLock, then there must be 
> concurrent readers possible.  VF went away long enough ago that I had 
> forgotten exactly how inconvenient it was.

It kinda depends on what you mean by concurrent readers, because a
transaction that could start on Monday and acquire an XID, and then on
Tuesday you could run VACUUM FULL on relation "foo", and then on
Wednesday the transaction from before could get around to reading some
data from "foo". The two transactions are concurrent, in the sense
that the 3-day transaction was running before the VACUUM FULL, was
still running after VACUUM FULL, read the same pages that the VACUUM
FULL modified, and cares whether the XID from the VACUUM FULL
committed or aborted. But, it's not concurrent in the sense that you
never have a situation where the VACUUM FULL does some of its
modifications, then an overlapping transaction sees them, and then it
does the rest of them.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-31 Thread Mark Dilger



> On Mar 31, 2021, at 10:31 AM, Mark Dilger  
> wrote:
> 
> 
> 
>> On Mar 31, 2021, at 10:11 AM, Robert Haas  wrote:
>> 
>> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
>>  wrote:
>>> I'm not looking at the old VACUUM FULL code, but my assumption is that if 
>>> the xvac code were resurrected, then when a tuple is moved off by a VACUUM 
>>> FULL, the old tuple and associated toast cannot be pruned until concurrent 
>>> transactions end.  So, if amcheck is running more-or-less concurrently with 
>>> the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM 
>>> FULL's xid, it can check the toast associated with the moved off tuple 
>>> after the VACUUM FULL commits.  If instead the VACUUM FULL xid was older 
>>> than amcheck's xmin, then the toast is in danger of being vacuumed away.  
>>> So the logic in verify_heapam would need to change to think about this 
>>> distinction.  We don't have to concern ourselves about that, because VACUUM 
>>> FULL cannot be running, and so the xid for it must be older than our xmin, 
>>> and hence the toast is unconditionally not safe to check.
>>> 
>>> I'm changing the comments back to how you had them, but I'd like to know 
>>> why my reasoning is wrong.
>> 
>> Let's start by figuring out *whether* your reasoning is wrong. My
>> assumption was that old-style VACUUM FULL would move tuples without
>> retoasting. That is, if we decided to move a tuple from page 2 of the
>> main table to page 1, we would just write the tuple into page 1,
>> marking it moved-in, and on page 2 we would mark it moved-off. And
>> that we would not examine or follow any TOAST pointers at all, so
>> whatever TOAST entries existed would be shared between the two tuples.
>> One tuple or the other would eventually die, depending on whether xvac
>> went on to commit or abort, but either way the TOAST doesn't need
>> updating because there's always exactly 1 remaining tuple using
>> pointers to those TOAST values.
>> 
>> Your assumption seems to be the opposite, that the TOASTed values
>> would be retoasted as part of VF. If that is true, then your idea is
>> right.
>> 
>> Do you agree with this analysis? If so, we can check the code and find
>> out which way it actually worked.
> 
> Actually, that makes a lot of sense without even looking at the old code.  I 
> was implicitly assuming that the toast table was undergoing a VF also, and 
> that the toast pointers in the main table tuples would be updated to point to 
> the new location, so we'd be unable to follow the pointers to the old 
> location without danger of the old location entries being vacuumed away. But 
> if the main table tuples get moved while keeping their toast pointers 
> unaltered, then you don't have to worry about that, although you do have to 
> worry that a VF of the main table doesn't help so much with toast table bloat.
> 
> We're only discussing this in order to craft the right comment for a bit of 
> code with respect to a hypothetical situation in which VF gets resurrected, 
> so I'm not sure this should be top priority, but I'm curious enough now to go 
> read the old code

Well, that's annoying.  The documentation of postgres 8.2 for vacuum full [1] 
says,

  Selects "full" vacuum, which may reclaim more space, but takes much longer 
and exclusively locks the table.

I read "exclusively locks" as meaning it takes an ExclusiveLock, but the code 
shows that it takes an AccessExclusiveLock.  I think the docs are pretty 
misleading here, though I understand that grammatically it is hard to say 
"accessively exclusively locks" or such.  But a part of my analysis was based 
on the reasoning that if VF only takes an ExclusiveLock, then there must be 
concurrent readers possible.  VF went away long enough ago that I had forgotten 
exactly how inconvenient it was.

[1] https://www.postgresql.org/docs/8.2/sql-vacuum.html

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 1:31 PM Mark Dilger
 wrote:
> Actually, that makes a lot of sense without even looking at the old code.  I 
> was implicitly assuming that the toast table was undergoing a VF also, and 
> that the toast pointers in the main table tuples would be updated to point to 
> the new location, so we'd be unable to follow the pointers to the old 
> location without danger of the old location entries being vacuumed away.  But 
> if the main table tuples get moved while keeping their toast pointers 
> unaltered, then you don't have to worry about that, although you do have to 
> worry that a VF of the main table doesn't help so much with toast table bloat.
>
> We're only discussing this in order to craft the right comment for a bit of 
> code with respect to a hypothetical situation in which VF gets resurrected, 
> so I'm not sure this should be top priority, but I'm curious enough now to go 
> read the old code

Right, well, we wouldn't be PostgreSQL hackers if we didn't spend lots
of time worrying about obscure details. Whether that's good software
engineering or mere pedantry is sometimes debatable.

I took a look at commit 0a469c87692d15a22eaa69d4b3a43dd8e278dd64,
which removed old-style VACUUM FULL, and AFAICS, it doesn't contain
any references to tuple deforming, varlena, HeapTupleHasExternal, or
anything else that would make me think it has the foggiest idea
whether the tuples it's moving around contain TOAST pointers, so I
think I had the right idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-31 Thread Mark Dilger



> On Mar 31, 2021, at 10:11 AM, Robert Haas  wrote:
> 
> On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
>  wrote:
>> I'm not looking at the old VACUUM FULL code, but my assumption is that if 
>> the xvac code were resurrected, then when a tuple is moved off by a VACUUM 
>> FULL, the old tuple and associated toast cannot be pruned until concurrent 
>> transactions end.  So, if amcheck is running more-or-less concurrently with 
>> the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM 
>> FULL's xid, it can check the toast associated with the moved off tuple after 
>> the VACUUM FULL commits.  If instead the VACUUM FULL xid was older than 
>> amcheck's xmin, then the toast is in danger of being vacuumed away.  So the 
>> logic in verify_heapam would need to change to think about this distinction. 
>>  We don't have to concern ourselves about that, because VACUUM FULL cannot 
>> be running, and so the xid for it must be older than our xmin, and hence the 
>> toast is unconditionally not safe to check.
>> 
>> I'm changing the comments back to how you had them, but I'd like to know why 
>> my reasoning is wrong.
> 
> Let's start by figuring out *whether* your reasoning is wrong. My
> assumption was that old-style VACUUM FULL would move tuples without
> retoasting. That is, if we decided to move a tuple from page 2 of the
> main table to page 1, we would just write the tuple into page 1,
> marking it moved-in, and on page 2 we would mark it moved-off. And
> that we would not examine or follow any TOAST pointers at all, so
> whatever TOAST entries existed would be shared between the two tuples.
> One tuple or the other would eventually die, depending on whether xvac
> went on to commit or abort, but either way the TOAST doesn't need
> updating because there's always exactly 1 remaining tuple using
> pointers to those TOAST values.
> 
> Your assumption seems to be the opposite, that the TOASTed values
> would be retoasted as part of VF. If that is true, then your idea is
> right.
> 
> Do you agree with this analysis? If so, we can check the code and find
> out which way it actually worked.

Actually, that makes a lot of sense without even looking at the old code.  I 
was implicitly assuming that the toast table was undergoing a VF also, and that 
the toast pointers in the main table tuples would be updated to point to the 
new location, so we'd be unable to follow the pointers to the old location 
without danger of the old location entries being vacuumed away.  But if the 
main table tuples get moved while keeping their toast pointers unaltered, then 
you don't have to worry about that, although you do have to worry that a VF of 
the main table doesn't help so much with toast table bloat.

We're only discussing this in order to craft the right comment for a bit of 
code with respect to a hypothetical situation in which VF gets resurrected, so 
I'm not sure this should be top priority, but I'm curious enough now to go read 
the old code

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-31 Thread Robert Haas
On Wed, Mar 31, 2021 at 12:34 AM Mark Dilger
 wrote:
> I'm not looking at the old VACUUM FULL code, but my assumption is that if the 
> xvac code were resurrected, then when a tuple is moved off by a VACUUM FULL, 
> the old tuple and associated toast cannot be pruned until concurrent 
> transactions end.  So, if amcheck is running more-or-less concurrently with 
> the VACUUM FULL and has a snapshot xmin no newer than the xid of the VACUUM 
> FULL's xid, it can check the toast associated with the moved off tuple after 
> the VACUUM FULL commits.  If instead the VACUUM FULL xid was older than 
> amcheck's xmin, then the toast is in danger of being vacuumed away.  So the 
> logic in verify_heapam would need to change to think about this distinction.  
> We don't have to concern ourselves about that, because VACUUM FULL cannot be 
> running, and so the xid for it must be older than our xmin, and hence the 
> toast is unconditionally not safe to check.
>
> I'm changing the comments back to how you had them, but I'd like to know why 
> my reasoning is wrong.

Let's start by figuring out *whether* your reasoning is wrong. My
assumption was that old-style VACUUM FULL would move tuples without
retoasting. That is, if we decided to move a tuple from page 2 of the
main table to page 1, we would just write the tuple into page 1,
marking it moved-in, and on page 2 we would mark it moved-off. And
that we would not examine or follow any TOAST pointers at all, so
whatever TOAST entries existed would be shared between the two tuples.
One tuple or the other would eventually die, depending on whether xvac
went on to commit or abort, but either way the TOAST doesn't need
updating because there's always exactly 1 remaining tuple using
pointers to those TOAST values.

Your assumption seems to be the opposite, that the TOASTed values
would be retoasted as part of VF. If that is true, then your idea is
right.

Do you agree with this analysis? If so, we can check the code and find
out which way it actually worked.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-30 Thread Mark Dilger


> On Mar 30, 2021, at 12:45 PM, Robert Haas  wrote:
> 
> On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger
>  wrote:
>> Sure, here are four patches which do the same as the single v12 patch did.
> 
> Thanks. Here are some comments on 0003 and 0004:
> 
> When you posted v11, you said that "Rather than print out all four
> toast pointer fields for each toast failure, va_rawsize, va_extsize,
> and va_toastrelid are only mentioned in the corruption message if they
> are related to the specific corruption.  Otherwise, just the
> va_valueid is mentioned in the corruption message." I like that
> principal; in fact, as you know, I suggested it. But, with the v13
> patches applied, exactly zero of the callers to
> report_toast_corruption() appear to be following it, because none of
> them include the value ID. I think you need to revise the messages,
> e.g.

These changes got lost between v11 and v12.  I've put them back, as well as 
updating to use your language.

> "toasted value for attribute %u missing from toast table" ->
> "toast value %u not found in toast table";

Changed.

> "final toast chunk number
> %u differs from expected value %u" -> "toast value %u was expected to
> end at chunk %u, but ended at chunk %u";

Changed.

> "toast chunk sequence number
> is null" -> "toast value %u has toast chunk with null sequence
> number".

Changed.

> In the first of those example cases, I think you need not
> mention the attribute number because it's already there in its own
> column.

Correct.  I'd removed that but lost that work in v12.

> On a related note, it doesn't look like you are actually checking
> va_toastrelid here. Doing so seems like it would be a good idea. It
> also seems like it would be good to check that the compressed size is
> less than or equal to the uncompressed size.

Yeah, those checks were in v11 but got lost when I changed things for v12.  
They are back in v14.

> I do not like the name tuple_is_volatile, because volatile has a
> couple of meanings already, and this isn't one of them. A
> SQL-callable function is volatile if it might return different outputs
> given the same inputs, even within the same SQL statement. A C
> variable is volatile if it might be magically modified in ways not
> known to the compiler. I had suggested tuple_cannot_die_now, which is
> closer to the mark. If you want to be even more precise, you could
> talk about whether the tuple is potentially prunable (e.g.
> tuple_can_be_pruned, which inverts the sense). That's really what
> we're worried about: whether MVCC rules would permit the tuple to be
> pruned after we release the buffer lock and before we check TOAST.

I used "tuple_can_be_pruned".  I didn't like "tuple_cannot_die_now", and still 
don't like that name, as it has several wrong interpretations.  One meaning of 
"cannot die now" is that it has become immortal.  Another is "cannot be deleted 
from the table".  

> I would ideally prefer not to rename report_corruption(). The old name
> is clearer, and changing it produces a bunch of churn that I'd rather
> avoid. Perhaps the common helper function could be called
> report_corruption_internal(), and the callers could be
> report_corruption() and report_toast_corruption().

Yes, hence the commit message in the previous patch set, "This patch can 
probably be left out if the committer believes it creates more git churn than 
it is worth."  I've removed this patch from this next patch set, and used the 
function names you suggest.

> Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer
> to correct now, but I want to go through it in more detail. I think,
> though, that you've made some of my comments worse. For example, I
> wrote: "It should be impossible for xvac to still be running, since
> we've removed all that code, but even if it were, it ought to be safe
> to read the tuple, since the original inserter must have committed.
> But, if the xvac transaction committed, this tuple (and its associated
> TOAST tuples) could be pruned at any time." You changed that to read
> "We don't bother comparing against safe_xmin because the VACUUM FULL
> must have committed prior to an upgrade and can't still be running."
> Your comment is shorter, which is a point in its favor, but what I was
> trying to emphasize is that the logic would be correct EVEN IF we
> again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your
> version makes it sound like the code would need to be revised in that
> case. If that's true, then my comment was wrong, but I didn't think it
> was true, or I wouldn't have written the comment in that way.

I think the logic would have to change if we brought back the old VACUUM FULL 
behavior.

I'm not looking at the old VACUUM FULL code, but my assumption is that if the 
xvac code were resurrected, then when a tuple is moved off by a VACUUM FULL, 
the old tuple and associated toast cannot be pruned until concurrent 
transactions end.  So, if amcheck is running more-or-less 

Re: pg_amcheck contrib application

2021-03-30 Thread Robert Haas
On Mon, Mar 29, 2021 at 7:16 PM Mark Dilger
 wrote:
> Sure, here are four patches which do the same as the single v12 patch did.

Thanks. Here are some comments on 0003 and 0004:

When you posted v11, you said that "Rather than print out all four
toast pointer fields for each toast failure, va_rawsize, va_extsize,
and va_toastrelid are only mentioned in the corruption message if they
are related to the specific corruption.  Otherwise, just the
va_valueid is mentioned in the corruption message." I like that
principal; in fact, as you know, I suggested it. But, with the v13
patches applied, exactly zero of the callers to
report_toast_corruption() appear to be following it, because none of
them include the value ID. I think you need to revise the messages,
e.g. "toasted value for attribute %u missing from toast table" ->
"toast value %u not found in toast table"; "final toast chunk number
%u differs from expected value %u" -> "toast value %u was expected to
end at chunk %u, but ended at chunk %u"; "toast chunk sequence number
is null" -> "toast value %u has toast chunk with null sequence
number". In the first of those example cases, I think you need not
mention the attribute number because it's already there in its own
column.

On a related note, it doesn't look like you are actually checking
va_toastrelid here. Doing so seems like it would be a good idea. It
also seems like it would be good to check that the compressed size is
less than or equal to the uncompressed size.

I do not like the name tuple_is_volatile, because volatile has a
couple of meanings already, and this isn't one of them. A
SQL-callable function is volatile if it might return different outputs
given the same inputs, even within the same SQL statement. A C
variable is volatile if it might be magically modified in ways not
known to the compiler. I had suggested tuple_cannot_die_now, which is
closer to the mark. If you want to be even more precise, you could
talk about whether the tuple is potentially prunable (e.g.
tuple_can_be_pruned, which inverts the sense). That's really what
we're worried about: whether MVCC rules would permit the tuple to be
pruned after we release the buffer lock and before we check TOAST.

I would ideally prefer not to rename report_corruption(). The old name
is clearer, and changing it produces a bunch of churn that I'd rather
avoid. Perhaps the common helper function could be called
report_corruption_internal(), and the callers could be
report_corruption() and report_toast_corruption().

Regarding 0001 and 0002, I think the logic in 0002 looks a lot closer
to correct now, but I want to go through it in more detail. I think,
though, that you've made some of my comments worse. For example, I
wrote: "It should be impossible for xvac to still be running, since
we've removed all that code, but even if it were, it ought to be safe
to read the tuple, since the original inserter must have committed.
But, if the xvac transaction committed, this tuple (and its associated
TOAST tuples) could be pruned at any time." You changed that to read
"We don't bother comparing against safe_xmin because the VACUUM FULL
must have committed prior to an upgrade and can't still be running."
Your comment is shorter, which is a point in its favor, but what I was
trying to emphasize is that the logic would be correct EVEN IF we
again started to use HEAP_MOVED_OFF and HEAP_MOVED_IN again. Your
version makes it sound like the code would need to be revised in that
case. If that's true, then my comment was wrong, but I didn't think it
was true, or I wouldn't have written the comment in that way.

Also, and maybe this is a job for a separate patch, but then again
maybe not, I wonder if it's really a good idea for get_xid_status to
return both a XidBoundsViolation and an XidCommitStatus. It seems to
me (without checking all that carefully) that it might be better to
just flatten all of that into a single enum, because right now it
seems like you often end up with two consecutive switch statements
where, perhaps, just one would suffice.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-29 Thread Mark Dilger


> On Mar 29, 2021, at 1:06 PM, Robert Haas  wrote:
> 
> Hmm, so this got ~10x bigger from my version. Could you perhaps
> separate it out into a series of patches for easier review? Say, one
> that just fixes the visibility logic, and then a second to avoid doing
> the TOAST check with a buffer lock held, and then more than that if
> there are other pieces that make sense to separate out?

Sure, here are four patches which do the same as the single v12 patch did.




v13-0001-Refactoring-function-check_tuple_header_and_visi.patch
Description: Binary data


v13-0002-Replacing-implementation-of-check_tuple_visibili.patch
Description: Binary data


v13-0003-Renaming-report_corruption-as-report_main_corrup.patch
Description: Binary data


v13-0004-Checking-toast-separately-from-the-main-table.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-29 Thread Robert Haas
On Mon, Mar 29, 2021 at 1:45 PM Mark Dilger
 wrote:
> Thanks!  The attached patch addresses your comments here and in your prior 
> email.  In particular, this patch changes the tuple visibility logic to not 
> check tuples for which the inserting transaction aborted or is still in 
> progress, and to not check toast for tuples deleted in transactions older 
> than our transaction snapshot's xmin.  A list of toasted attributes which are 
> safe to check is compiled per main table page during the scan of the page, 
> then checked after the buffer lock on the main page is released.
>
> In the perhaps unusual case where verify_heapam() is called in a transaction 
> which has also added tuples to the table being checked, this patch's 
> visibility logic chooses not to check such tuples.  I'm on the fence about 
> this choice, and am mostly following your lead.  I like that this decision 
> maintains the invariant that we never check tuples which have not yet been 
> committed.
>
> The patch includes a bit of refactoring.  In the old code, heap_check() 
> performed clog bounds checking on xmin and xmax prior to calling 
> check_tuple_header_and_visibilty(), but I think that's not such a great 
> choice.  If the tuple header is garbled to have random bytes in the xmin and 
> xmax fields, and we can detect that situation because other tuple header 
> fields are garbled in detectable ways, I'd rather get a report about the 
> header being garbled than a report about the xmin or xmax being out of 
> bounds.  In the new code, the tuple header is checked first, then the 
> visibility is checked, then the tuple is checked against the current relation 
> description, then the tuple attributes are checked.  I think the layout is 
> easier to follow, too.

Hmm, so this got ~10x bigger from my version. Could you perhaps
separate it out into a series of patches for easier review? Say, one
that just fixes the visibility logic, and then a second to avoid doing
the TOAST check with a buffer lock held, and then more than that if
there are other pieces that make sense to separate out?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-29 Thread Mark Dilger


> On Mar 24, 2021, at 1:46 PM, Robert Haas  wrote:
> 
> Mark,
> 
> Here's a quick and very dirty sketch of what I think perhaps this
> logic could look like. This is pretty much untested and it might be
> buggy, but at least you can see whether we're thinking at all in the
> same direction.

Thanks!  The attached patch addresses your comments here and in your prior 
email.  In particular, this patch changes the tuple visibility logic to not 
check tuples for which the inserting transaction aborted or is still in 
progress, and to not check toast for tuples deleted in transactions older than 
our transaction snapshot's xmin.  A list of toasted attributes which are safe 
to check is compiled per main table page during the scan of the page, then 
checked after the buffer lock on the main page is released.

In the perhaps unusual case where verify_heapam() is called in a transaction 
which has also added tuples to the table being checked, this patch's visibility 
logic chooses not to check such tuples.  I'm on the fence about this choice, 
and am mostly following your lead.  I like that this decision maintains the 
invariant that we never check tuples which have not yet been committed.

The patch includes a bit of refactoring.  In the old code, heap_check() 
performed clog bounds checking on xmin and xmax prior to calling 
check_tuple_header_and_visibilty(), but I think that's not such a great choice. 
 If the tuple header is garbled to have random bytes in the xmin and xmax 
fields, and we can detect that situation because other tuple header fields are 
garbled in detectable ways, I'd rather get a report about the header being 
garbled than a report about the xmin or xmax being out of bounds.  In the new 
code, the tuple header is checked first, then the visibility is checked, then 
the tuple is checked against the current relation description, then the tuple 
attributes are checked.  I think the layout is easier to follow, too.




v12-0001-Fix-visibility-and-locking-issues.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
Mark,

Here's a quick and very dirty sketch of what I think perhaps this
logic could look like. This is pretty much untested and it might be
buggy, but at least you can see whether we're thinking at all in the
same direction.

-- 
Robert Haas
EDB: http://www.enterprisedb.com


very-rough-visibility-ideas.patch
Description: Binary data


Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 9:12 AM Robert Haas  wrote:
> On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger
>  wrote:
> > The visibility rules fix is different in v11, relying on a visibility check 
> > which more closely follows the implementation of 
> > HeapTupleSatisfiesVacuumHorizon.
>
> Hmm. The header comment you wrote says "If a tuple might not be
> visible to any running transaction, then we must not check it." But, I
> don't find that statement very clear: does it mean "if there could be
> even one transaction to which this tuple is not visible, we must not
> check it"? Or does it mean "if the number of transactions that can see
> this tuple could potentially be zero, then we must not check it"? I
> don't think either of those is actually what we care about. I think
> what we should be saying is "if the tuple could have been inserted by
> a transaction that also added a column to the table, but which
> ultimately did not commit, then the table's current TupleDesc might
> differ from the one used to construct this tuple, so we must not check
> it."

Hit send too soon. And I was wrong, too. Wahoo. Thinking about the
buildfarm failure, I realized that there's a second danger here,
unrelated to the possibility of different TupleDescs, which we talked
about before: if the tuple is dead, we can't safely follow any TOAST
pointers, because the TOAST chunks might disappear at any time. So
technically we could split the return value up into something
three-way: if the inserted is known to have committed, we can check
the tuple itself, because the TupleDesc has to be reasonable. And, if
the tuple is known not to be dead already, and known not to be in a
state where it could become dead while we're doing stuff, we can
follow the TOAST pointer. I'm not sure whether it's worth trying to be
that fancy or not.

If we were only concerned about the mismatched-TupleDesc problem, this
function could return true in a lot more cases. Once we get to the
comment that says "Okay, the inserter committed..." we could just
return true. Similarly, the HEAP_MOVED_IN and HEAP_MOVED_OFF cases
could just skip all the interior test and return true, because if the
tuple is being moved, the original inserter has to have committed.
Conversely, however, the !HeapTupleHeaderXminCommitted ->
TransactionIdIsCurrentTransactionId case probably ought to always
return false. One could argue otherwise: if we're the inserter, then
the only in-progress transaction that might have changed the TupleDesc
is us, so we could just consider this case to be a true return value
also, regardless of what's going on with xmax. In that case, we're not
asking "did the inserter definitely commit?" but "are the inserter's
possible DDL changes definitely visible to us?" which might be an OK
definition too.

However, the could-the-TOAST-data-disappear problem is another story.
I don't see how we can answer that question correctly with the logic
you've got here, because you have no XID threshold. Consider the case
where we reach this code:

+   if (!(tuphdr->t_infomask & HEAP_XMAX_COMMITTED))
+   {
+   if
(TransactionIdIsInProgress(HeapTupleHeaderGetRawXmax(tuphdr)))
+   return true;/*
HEAPTUPLE_DELETE_IN_PROGRESS */
+   else if
(!TransactionIdDidCommit(HeapTupleHeaderGetRawXmax(tuphdr)))
+
+   /*
+* Not in Progress, Not Committed, so either
Aborted or crashed
+*/
+   return true;/* HEAPTUPLE_LIVE */
+
+   /* At this point the xmax is known committed */
+   }

If we reach the case where the code comment says
HEAPTUPLE_DELETE_IN_PROGRESS, we know that the tuple isn't dead right
now, and so the TOAST tuples aren't dead either. But, by the time we
go try to look at the TOAST tuples, they might have become dead and
been pruned away, because the deleting transaction can commit at any
time, and after that pruning can happen at any time. Our only
guarantee that that won't happen is if the deleting XID is new enough
that it's invisible to some snapshot that our backend has registered.
That's approximately why HeapTupleSatisfiesVacuumHorizon needs to set
*dead_after in this case and one other, and I think you have the same
requirement.

I just noticed that this whole thing has another, related problem:
check_tuple_header_and_visibilty() and check_tuple_attribute() are
called from within check_tuple(), which is called while we hold a
buffer lock on the heap page. We should not be going and doing complex
operations that might take their own buffer locks - like TOAST index
checks - while we're holding an lwlock. That's going to have to be
changed so that the TOAST pointer checking happens after
UnlockReleaseBuffer(); in other words, we'll need to remember the
TOAST pointers to go look up and actually look them up after
UnlockReleaseBuffer(). But, when we do that, then the HEAPTUPLE_LIVE
case above has the 

Re: pg_amcheck contrib application

2021-03-24 Thread Robert Haas
On Wed, Mar 24, 2021 at 2:13 AM Mark Dilger
 wrote:
> The visibility rules fix is different in v11, relying on a visibility check 
> which more closely follows the implementation of 
> HeapTupleSatisfiesVacuumHorizon.

Hmm. The header comment you wrote says "If a tuple might not be
visible to any running transaction, then we must not check it." But, I
don't find that statement very clear: does it mean "if there could be
even one transaction to which this tuple is not visible, we must not
check it"? Or does it mean "if the number of transactions that can see
this tuple could potentially be zero, then we must not check it"? I
don't think either of those is actually what we care about. I think
what we should be saying is "if the tuple could have been inserted by
a transaction that also added a column to the table, but which
ultimately did not commit, then the table's current TupleDesc might
differ from the one used to construct this tuple, so we must not check
it."

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-24 Thread Mark Dilger


> On Mar 17, 2021, at 9:00 PM, Mark Dilger  wrote:
> 
> Of the toast pointer fields:
> 
>int32   va_rawsize; /* Original data size (includes header) */
>int32   va_extsize; /* External saved size (doesn't) */
>Oid va_valueid; /* Unique ID of value within TOAST table */
>Oid va_toastrelid;  /* RelID of TOAST table containing it */
> 
> all seem worth getting as part of any toast error message, even if these 
> fields themselves are not corrupt.  It just makes it easier to understand the 
> context of the error you're looking at. At first I tried putting these into 
> each message, but it is very wordy to say things like "toast pointer with 
> rawsize %u and extsize %u pointing at relation with oid %u" and such.  It 
> made more sense to just add these four fields to the verify_heapam tuple 
> format.  That saves putting them in the message text itself, and has the 
> benefit that you could filter the rows coming from verify_heapam() for ones 
> where valueid is or is not null, for example.  This changes the external 
> interface of verify_heapam, but I didn't bother with a amcheck--1.3--1.4.sql 
> because amcheck--1.2--1.3. sql was added as part of the v14 development work 
> and has not yet been released.  My assumption is that I can just change it, 
> rather than making a new upgrade file.
> 
> These patches fix the visibility rules and add extra toast checking. 

These new patches address the same issues as v9 (which was never committed), 
and v10 (which was never even posted to this list), with some changes.

Rather than print out all four toast pointer fields for each toast failure, 
va_rawsize, va_extsize, and va_toastrelid are only mentioned in the corruption 
message if they are related to the specific corruption.  Otherwise, just the 
va_valueid is mentioned in the corruption message.

The visibility rules fix is different in v11, relying on a visibility check 
which more closely follows the implementation of 
HeapTupleSatisfiesVacuumHorizon.



v11-0001-Fixing-amcheck-tuple-visibility-rules.patch
Description: Binary data


v11-0002-pg_amcheck-extend-toast-corruption-reports.patch
Description: Binary data
 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:53 PM Peter Geoghegan  wrote:
> One of the advantages of this design is that we verify practically all
> of the work involved in deleting an entire subtree up-front, inside
> _bt_lock_subtree_parent(). It's clearly safe to back out of it if it
> looks dicey.

That's taken care of. I just pushed a commit that teaches
_bt_lock_subtree_parent() to press on.

-- 
Peter Geoghegan




Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:44 PM Tom Lane  wrote:
> > I will make this change to HEAD soon, barring objections.
>
> +1.  Not deleting the upper page seems better than the alternatives.

FWIW it might also work that way as a holdover from the old page
deletion algorithm. These days we decide exactly which pages (leaf
page plus possible internal pages) are to be deleted as a whole up
front (these are a subtree, though usually just a degenerate
single-leaf-page subtree -- internal page deletions are rare).

One of the advantages of this design is that we verify practically all
of the work involved in deleting an entire subtree up-front, inside
_bt_lock_subtree_parent(). It's clearly safe to back out of it if it
looks dicey.

--
Peter Geoghegan




Re: pg_amcheck contrib application

2021-03-23 Thread Tom Lane
Peter Geoghegan  writes:
> That being said, I should make _bt_lock_subtree_parent() return false
> and back out of page deletion without raising an error in the case
> where we really cannot locate a valid downlink. We really ought to
> soldier on when that happens, since we'll do that for a bunch of other
> reasons already. I believe that the only reason we throw an error
> today is for parity with the page split case (the main
> _bt_getstackbuf() call). But this isn't the same situation at all --
> this is VACUUM.

> I will make this change to HEAD soon, barring objections.

+1.  Not deleting the upper page seems better than the alternatives.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-23 Thread Peter Geoghegan
On Tue, Mar 23, 2021 at 12:05 PM Robert Haas  wrote:
> Right, good point. But... does that really apply to
> 005_opclass_damage.pl? I feel like with the kind of physical damage
> we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
> from crashing headlong into that table. But, 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

FWIW that is only 99.9% true (contrary to what README.HOT says). This
is the case because nbtree page deletion will in fact search the tree
to find a downlink to the target page, which must be removed at the
same time -- see the call to _bt_search() made within nbtpage.c.

This is much less of a problem than you'd think, though, even with an
opclass that gives wrong answers all the time. Because it's also true
that _bt_getstackbuf() is remarkably tolerant when it actually goes to
locate the downlink -- because that happens via a linear search that
matches on downlink block number (it doesn't use the opclass for that
part). This means that we'll accidentally fail to fail if the page is
*somewhere* to the right in the "true" key space. Which probably means
that it has a greater than 50% chance of not failing with a 100%
broken opclass. Which probably makes our odds better with more
plausible levels of misbehavior (e.g. collation changes).

That being said, I should make _bt_lock_subtree_parent() return false
and back out of page deletion without raising an error in the case
where we really cannot locate a valid downlink. We really ought to
soldier on when that happens, since we'll do that for a bunch of other
reasons already. I believe that the only reason we throw an error
today is for parity with the page split case (the main
_bt_getstackbuf() call). But this isn't the same situation at all --
this is VACUUM.

I will make this change to HEAD soon, barring objections.

-- 
Peter Geoghegan




Re: pg_amcheck contrib application

2021-03-23 Thread Mark Dilger



> On Mar 23, 2021, at 12:05 PM, Robert Haas  wrote:
> 
> 005 is doing "logical"
> damage rather than "physical" damage, and I don't see why autovacuum
> should misbehave in that kind of case. In fact, the fact that
> autovacuum can handle such cases is one of the selling points for the
> whole design of vacuum, as opposed to, for example, retail index
> lookups.

That is a good point.  Checking that autovacuum behaves sensibly despite sort 
order breakage sounds reasonable, but test 005 doesn't do that reliably, 
because it does nothing to make sure that autovacuum runs against the affected 
table during the short window when the affected table exists.  All the same, I 
don't see that turning autovacuum off is required.  If autovacuum is broken in 
this regard, we may get occasional, hard to reproduce build farm failures, but 
that would be more informative than no failures at all.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-23 Thread Robert Haas
On Thu, Mar 18, 2021 at 12:12 AM Tom Lane  wrote:
> Mark Dilger  writes:
> >> On Mar 16, 2021, at 12:52 PM, Robert Haas  wrote:
> >> Since we now know that shutting autovacuum off makes the problem go
> >> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> >> instead, if, as presently seems to be the case, that's where the
> >> problem is.
>
> > If you get unlucky, autovacuum will process one of the tables that the test 
> > intentionally corrupted, with bad consequences, ultimately causing build 
> > farm intermittent test failures.
>
> Um, yeah, the test had better shut off autovacuum on any table that
> it intentionally corrupts.

Right, good point. But... does that really apply to
005_opclass_damage.pl? I feel like with the kind of physical damage
we're doing in 003_check.pl, it makes a lot of sense to stop vacuum
from crashing headlong into that table. But, 005 is doing "logical"
damage rather than "physical" damage, and I don't see why autovacuum
should misbehave in that kind of case. In fact, the fact that
autovacuum can handle such cases is one of the selling points for the
whole design of vacuum, as opposed to, for example, retail index
lookups.

Pending resolution of that question, I've committed the change to
disable autovacuum in 003, and also Mark's changes to have it also run
pg_amcheck BEFORE corrupting anything, so the post-corruption tests
fail - say by finding the wrong kind of corruption - we can see
whether it was also failing before the corruption was even introduced.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-17 Thread Tom Lane
Mark Dilger  writes:
>> On Mar 16, 2021, at 12:52 PM, Robert Haas  wrote:
>> Since we now know that shutting autovacuum off makes the problem go
>> away, I don't see a reason to commit 0001. We should fix pg_amcheck
>> instead, if, as presently seems to be the case, that's where the
>> problem is.

> If you get unlucky, autovacuum will process one of the tables that the test 
> intentionally corrupted, with bad consequences, ultimately causing build farm 
> intermittent test failures.

Um, yeah, the test had better shut off autovacuum on any table that
it intentionally corrupts.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-17 Thread Mark Dilger


> On Mar 16, 2021, at 12:52 PM, Robert Haas  wrote:
> 
> On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger
>  wrote:
>> It is unfortunate that the failing test only runs pg_amcheck after creating 
>> numerous corruptions, as we can't know if pg_amcheck would have complained 
>> about pg_statistic before the corruptions were created in other tables, or 
>> if it only does so after.  The attached patch v7-0003 adds a call to 
>> pg_amcheck after all tables are created and populated, but before any 
>> corruptions are caused.  This should help narrow down what is happening, and 
>> doesn't hurt to leave in place long-term.
>> 
>> I don't immediately see anything wrong with how pg_statistic uses a 
>> pseudo-type, but it leads me to want to poke a bit more at pg_statistic on 
>> hornet and tern, though I don't have any regression tests specifically for 
>> doing so.
>> 
>> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.
> 
> Since we now know that shutting autovacuum off makes the problem go
> away, I don't see a reason to commit 0001. We should fix pg_amcheck
> instead, if, as presently seems to be the case, that's where the
> problem is.

If you get unlucky, autovacuum will process one of the tables that the test 
intentionally corrupted, with bad consequences, ultimately causing build farm 
intermittent test failures.  We could wait to see if this ever happens before 
fixing it, if you prefer.  I'm not sure what that buys us, though.

The right approach, I think, is to extend the contrib/amcheck tests to include 
regressing this particular case to see if it fails.  I've written that test and 
verified that it fails against the old code and passes against the new.

> I just committed 0002.

Thanks!

> I think 0003 is probably a good idea, but I haven't committed it yet.

It won't do anything for us in this particular case, but it might make 
debugging failures quicker in the future.

> As for 0004, it seems to me that we might want to do a little more
> rewording of these messages and perhaps we should try to do it all at
> once. Like, for example, your other change to print out the toast
> value ID seems like a good idea, and could apply to any new messages
> as well as some existing ones. Maybe there are also more fields in the
> TOAST pointer for which we could add checks.

Of the toast pointer fields:

int32   va_rawsize; /* Original data size (includes header) */
int32   va_extsize; /* External saved size (doesn't) */
Oid va_valueid; /* Unique ID of value within TOAST table */
Oid va_toastrelid;  /* RelID of TOAST table containing it */

all seem worth getting as part of any toast error message, even if these fields 
themselves are not corrupt.  It just makes it easier to understand the context 
of the error you're looking at.  At first I tried putting these into each 
message, but it is very wordy to say things like "toast pointer with rawsize %u 
and extsize %u pointing at relation with oid %u" and such.  It made more sense 
to just add these four fields to the verify_heapam tuple format.  That saves 
putting them in the message text itself, and has the benefit that you could 
filter the rows coming from verify_heapam() for ones where valueid is or is not 
null, for example.  This changes the external interface of verify_heapam, but I 
didn't bother with a amcheck--1.3--1.4.sql because amcheck--1.2--1.3. sql was 
added as part of the v14 development work and has not yet been released.  My 
assumption is that I can just change it, rather than making a new upgrade file.

These patches fix the visibility rules and add extra toast checking.  Some of 
the previous patch material is not included, since it is not clear to me if you 
wanted to commit any of it.




v9-0001-Fixing-amcheck-tuple-visibility-rules.patch
Description: Binary data


v9-0002-pg_amcheck-provide-additional-toast-corruption-in.patch
Description: Binary data
 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Mon, Mar 15, 2021 at 10:10 PM Mark Dilger
 wrote:
> It is unfortunate that the failing test only runs pg_amcheck after creating 
> numerous corruptions, as we can't know if pg_amcheck would have complained 
> about pg_statistic before the corruptions were created in other tables, or if 
> it only does so after.  The attached patch v7-0003 adds a call to pg_amcheck 
> after all tables are created and populated, but before any corruptions are 
> caused.  This should help narrow down what is happening, and doesn't hurt to 
> leave in place long-term.
>
> I don't immediately see anything wrong with how pg_statistic uses a 
> pseudo-type, but it leads me to want to poke a bit more at pg_statistic on 
> hornet and tern, though I don't have any regression tests specifically for 
> doing so.
>
> Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.

Since we now know that shutting autovacuum off makes the problem go
away, I don't see a reason to commit 0001. We should fix pg_amcheck
instead, if, as presently seems to be the case, that's where the
problem is.

I just committed 0002.

I think 0003 is probably a good idea, but I haven't committed it yet.

As for 0004, it seems to me that we might want to do a little more
rewording of these messages and perhaps we should try to do it all at
once. Like, for example, your other change to print out the toast
value ID seems like a good idea, and could apply to any new messages
as well as some existing ones. Maybe there are also more fields in the
TOAST pointer for which we could add checks.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger



> On Mar 16, 2021, at 11:40 AM, Robert Haas  wrote:
> 
> On Tue, Mar 16, 2021 at 2:22 PM Tom Lane  wrote:
>> I'm circling back around to the idea that amcheck is trying to
>> validate TOAST references that are already dead, and it's getting
>> burnt because something-or-other has already removed the toast
>> rows, though not the referencing datums.  That's legal behavior
>> once the rows are marked dead.  Upthread it was claimed that
>> amcheck isn't doing that, but this looks like a smoking gun to me.
> 
> I think this theory has some legs. From check_tuple_header_and_visibilty():
> 
>else if (!(infomask & HEAP_XMAX_COMMITTED))
>return true;/*
> HEAPTUPLE_DELETE_IN_PROGRESS or
> *
> HEAPTUPLE_LIVE */
>else
>return false;   /*
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
>}
>return true;/* not dead */
> }
> 
> That first case looks wrong to me. Don't we need to call
> get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
> and the xmax is marked committed, we consider the tuple checkable. The
> comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
> xmax is all-visible. And in the second case the comment says it's
> either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
> case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
> HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
> status.
> 
> Another thought here is that it's probably not wicked smart to be
> relying on the hint bits to match the actual status of the tuple in
> cases of corruption. Maybe we should be warning about tuples that are
> have xmin or xmax flagged as committed or invalid when in fact clog
> disagrees. That's not a particularly uncommon case, and it's hard to
> check.

This code was not committed as part of the recent pg_amcheck work, but longer 
ago, and I'm having trouble reconstructing exactly why it was written that way.

Changing check_tuple_header_and_visibilty() fixes the regression test and also 
manual tests against the "regression" database that I've been using.  I'd like 
to ponder the changes a while longer before I post, but the fact that these 
changes fix the tests seems to add credibility to this theory.


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-16 Thread Andrew Dunstan


On 3/13/21 1:30 AM, Andres Freund wrote:
> Hi,
>
> On 2021-03-13 01:22:54 -0500, Tom Lane wrote:
>> Mark Dilger  writes:
>>> On Mar 12, 2021, at 10:16 PM, Noah Misch  wrote:
 hoverfly does configure with PERL=perl64.  /usr/bin/prove is from the 
 32-bit
 Perl, so I suspect the TAP suites get 32-bit Perl that way.  (There's no
 "prove64".)
>> Oh, that's annoying.
> I suspect we could solve that by invoking changing our /usr/bin/prove
> invocation to instead be PERL /usr/bin/prove? That might be a good thing
> independent of this issue, because it's not unreasonable for a user to
> expect that we'd actually use the perl installation they configured...
>
> Although I do not know how prove determines the perl installation it's
> going to use for the test scripts...
>


There's a very good chance this would break msys builds, which are
configured to build against a pure native (i.e. non-msys) perl sucj as
AS or Strawberry, but need to run msys perl for TAP tests, so it gets
the paths right.

(Don't get me started on the madness that can result from managing this.
I've lost weeks of my life to it ... If you add cygwin into the mix and
you're trying to coordinate builds among three buildfarm animals it's a
major pain.)


cheers


andrew

--
Andrew Dunstan
EDB: https://www.enterprisedb.com





Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 2:45 PM Tom Lane  wrote:
> > In what context is it OK to just add extra alignment padding?
>
> It's *not* extra, according to pg_statistic's tuple descriptor.
> Both forming and deforming of pg_statistic tuples should honor
> that and locate stavaluesX values on d-aligned boundaries.
>
> It could be that a particular entry is of an array type that
> only requires i-alignment.  But that doesn't break anything,
> it just means we inserted more padding than an omniscient
> implementation would do.

OK, yeah, I just misunderstood what you were saying.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger  writes:
> CopyStatistics seems to just copy Form_pg_statistic without regard for
> who owns the toast.  Is this safe?

No less so than a ton of other places that insert values that might've
come from on-disk storage.  heap_toast_insert_or_update() is responsible
for dealing with the problem.  These days it looks like it's actually
toast_tuple_init() that takes care of dereferencing previously-toasted
values during an INSERT.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 16, 2021 at 1:48 PM Tom Lane  wrote:
>> No.  What should be happening there is that some arrays in the column
>> get larger alignment than they actually need, but that shouldn't cause
>> a problem (and has not done so, AFAIK, in the decades that it's been
>> like this).  As you say, deforming the tuple is going to rely on the
>> table's tupdesc for alignment; it can't know what is in the data.

> OK, I don't understand this. attalign is 'd', which is already the
> maximum possible, and even if it weren't, individual rows can't decide
> to use a larger OR smaller alignment than expected without breaking
> stuff.

> In what context is it OK to just add extra alignment padding?

It's *not* extra, according to pg_statistic's tuple descriptor.
Both forming and deforming of pg_statistic tuples should honor
that and locate stavaluesX values on d-aligned boundaries.

It could be that a particular entry is of an array type that
only requires i-alignment.  But that doesn't break anything,
it just means we inserted more padding than an omniscient
implementation would do.

(I suppose in some parallel universe there could be a machine
where i-alignment is stricter than d-alignment, and then we'd
have trouble.)

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 2:22 PM Tom Lane  wrote:
> I'm circling back around to the idea that amcheck is trying to
> validate TOAST references that are already dead, and it's getting
> burnt because something-or-other has already removed the toast
> rows, though not the referencing datums.  That's legal behavior
> once the rows are marked dead.  Upthread it was claimed that
> amcheck isn't doing that, but this looks like a smoking gun to me.

I think this theory has some legs. From check_tuple_header_and_visibilty():

else if (!(infomask & HEAP_XMAX_COMMITTED))
return true;/*
HEAPTUPLE_DELETE_IN_PROGRESS or
 *
HEAPTUPLE_LIVE */
else
return false;   /*
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD */
}
return true;/* not dead */
}

That first case looks wrong to me. Don't we need to call
get_xid_status() here, Mark? As coded, it seems that if the xmin is ok
and the xmax is marked committed, we consider the tuple checkable. The
comment says it must be HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_LIVE, but it seems to me that if the actual answer is either
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD depending on whether the
xmax is all-visible. And in the second case the comment says it's
either HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, but I think in that
case it's either HEAPTUPLE_DELETE_IN_PROGRESS or
HEAPTUPLE_RECENTLY_DEAD or HEAPTUPLE_DEAD, depending on the XID
status.

Another thought here is that it's probably not wicked smart to be
relying on the hint bits to match the actual status of the tuple in
cases of corruption. Maybe we should be warning about tuples that are
have xmin or xmax flagged as committed or invalid when in fact clog
disagrees. That's not a particularly uncommon case, and it's hard to
check.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger



> On Mar 16, 2021, at 10:48 AM, Tom Lane  wrote:
> 
> I'm not entirely sure what's going on, but I think coming at this
> with the mindset that "amcheck has detected some corruption" is
> just going to lead you astray.  Almost certainly, it's "amcheck
> is incorrectly claiming corruption".  That might be from mis-decoding
> a TOAST-referencing datum.  (Too bad the message doesn't report the
> TOAST OID it probed for, so we can see if that's sane or not.)

CopyStatistics seems to just copy Form_pg_statistic without regard for who owns 
the toast.  Is this safe?  Looking at RemoveStatistics, I'm not sure that it 
is.  Anybody more familiar with this code care to give an opinion?

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 1:48 PM Tom Lane  wrote:
> No.  What should be happening there is that some arrays in the column
> get larger alignment than they actually need, but that shouldn't cause
> a problem (and has not done so, AFAIK, in the decades that it's been
> like this).  As you say, deforming the tuple is going to rely on the
> table's tupdesc for alignment; it can't know what is in the data.

OK, I don't understand this. attalign is 'd', which is already the
maximum possible, and even if it weren't, individual rows can't decide
to use a larger OR smaller alignment than expected without breaking
stuff.

In what context is it OK to just add extra alignment padding?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger  writes:
> On Mar 16, 2021, at 10:48 AM, Tom Lane  wrote:
>> (Too bad the message doesn't report the
>> TOAST OID it probed for, so we can see if that's sane or not.)

> I've added that and now get the toast pointer's va_valueid in the message:

> heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 2, 
> attribute 29:
> toasted value id 13227 for attribute 29 missing from toast table
> heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 5, 
> attribute 27:
> toasted value id 13228 for attribute 27 missing from toast table

That's awfully interesting, because OIDs less than 16384 should
only be generated during initdb.  So what we seem to be looking at
here is pg_statistic entries that were made during the ANALYZE
done by initdb (cf. vacuum_db()), and then were deleted during
a later auto-analyze (in the buildfarm) or deliberate analyze
(in your reproducer).  But if they're deleted, why is amcheck
looking for them?

I'm circling back around to the idea that amcheck is trying to
validate TOAST references that are already dead, and it's getting
burnt because something-or-other has already removed the toast
rows, though not the referencing datums.  That's legal behavior
once the rows are marked dead.  Upthread it was claimed that
amcheck isn't doing that, but this looks like a smoking gun to me.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger



> On Mar 16, 2021, at 10:48 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger
>>  wrote:
>>> It shows them all has having attalign = 'd', but for some array types the 
>>> alignment will be 'i', not 'd'.  So it's lying a bit about the contents.  
>>> But I'm now confused why this caused problems on the two hosts where 
>>> integer and double have the same alignment?  It seems like that would be 
>>> the one place where the bug would not happen, not the one place where it 
>>> does.
> 
>> Wait, so the value in the attalign column isn't the alignment we're
>> actually using? I can understand how we might generate tuples like
>> that, if we pass the actual type to construct_array(), but shouldn't
>> we then get garbage when we deform the tuple?
> 
> No.  What should be happening there is that some arrays in the column
> get larger alignment than they actually need, but that shouldn't cause
> a problem (and has not done so, AFAIK, in the decades that it's been
> like this).  As you say, deforming the tuple is going to rely on the
> table's tupdesc for alignment; it can't know what is in the data.
> 
> I'm not entirely sure what's going on, but I think coming at this
> with the mindset that "amcheck has detected some corruption" is
> just going to lead you astray.  Almost certainly, it's "amcheck
> is incorrectly claiming corruption".  That might be from mis-decoding
> a TOAST-referencing datum.  (Too bad the message doesn't report the
> TOAST OID it probed for, so we can see if that's sane or not.)

I've added that and now get the toast pointer's va_valueid in the message:

mark.dilger@laptop280-ma-us amcheck % pg_amcheck -t "pg_catalog.pg_statistic" 
postgres
heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 2, attribute 
29:
toasted value id 13227 for attribute 29 missing from toast table
heap table "postgres"."pg_catalog"."pg_statistic", block 4, offset 5, attribute 
27:
toasted value id 13228 for attribute 27 missing from toast table

diff --git a/contrib/amcheck/verify_heapam.c b/contrib/amcheck/verify_heapam.c
index 5ccae46375..a0be71bb7f 100644
--- a/contrib/amcheck/verify_heapam.c
+++ b/contrib/amcheck/verify_heapam.c
@@ -,8 +,8 @@ check_tuple_attribute(HeapCheckContext *ctx)
}
if (!found_toasttup)
report_corruption(ctx,
- psprintf("toasted value for 
attribute %u missing from toast table",
-  
ctx->attnum));
+ psprintf("toasted value id %u 
for attribute %u missing from toast table",
+  
toast_pointer.va_valueid, ctx->attnum));
else if (ctx->chunkno != (ctx->endchunk + 1))
report_corruption(ctx,
  psprintf("final toast chunk 
number %u differs from expected value %u",

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
... btw, I now see that tern and hornet are passing this test
at least as much as they're failing, proving that there's some
timing or random chance involved.  That doesn't completely
eliminate the idea that there may be an architecture component
to the issue, but it sure reduces its credibility.  I now
believe the theory that the triggering condition is an auto-analyze
happening at the right time, and populating pg_statistic with
some data that other runs don't see.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Robert Haas  writes:
> On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger
>  wrote:
>> It shows them all has having attalign = 'd', but for some array types the 
>> alignment will be 'i', not 'd'.  So it's lying a bit about the contents.  
>> But I'm now confused why this caused problems on the two hosts where integer 
>> and double have the same alignment?  It seems like that would be the one 
>> place where the bug would not happen, not the one place where it does.

> Wait, so the value in the attalign column isn't the alignment we're
> actually using? I can understand how we might generate tuples like
> that, if we pass the actual type to construct_array(), but shouldn't
> we then get garbage when we deform the tuple?

No.  What should be happening there is that some arrays in the column
get larger alignment than they actually need, but that shouldn't cause
a problem (and has not done so, AFAIK, in the decades that it's been
like this).  As you say, deforming the tuple is going to rely on the
table's tupdesc for alignment; it can't know what is in the data.

I'm not entirely sure what's going on, but I think coming at this
with the mindset that "amcheck has detected some corruption" is
just going to lead you astray.  Almost certainly, it's "amcheck
is incorrectly claiming corruption".  That might be from mis-decoding
a TOAST-referencing datum.  (Too bad the message doesn't report the
TOAST OID it probed for, so we can see if that's sane or not.)

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Robert Haas
On Tue, Mar 16, 2021 at 12:51 PM Mark Dilger
 wrote:
> Yeah, that looks related:
>
> regression=# select attname, attlen, attnum, attalign from pg_attribute where 
> attrelid = 2619 and attname like 'stavalue%';
>   attname   | attlen | attnum | attalign
> +++--
>  stavalues1 | -1 | 27 | d
>  stavalues2 | -1 | 28 | d
>  stavalues3 | -1 | 29 | d
>  stavalues4 | -1 | 30 | d
>  stavalues5 | -1 | 31 | d
> (5 rows)
>
> It shows them all has having attalign = 'd', but for some array types the 
> alignment will be 'i', not 'd'.  So it's lying a bit about the contents.  But 
> I'm now confused why this caused problems on the two hosts where integer and 
> double have the same alignment?  It seems like that would be the one place 
> where the bug would not happen, not the one place where it does.

Wait, so the value in the attalign column isn't the alignment we're
actually using? I can understand how we might generate tuples like
that, if we pass the actual type to construct_array(), but shouldn't
we then get garbage when we deform the tuple? I mean,
heap_deform_tuple() is going to get the alignment from the tuple
descriptor, which is a table property. It doesn't (and can't) know
what type of value is stored inside any of these fields for real,
right?

In other words, isn't this actually corruption, caused by a bug in our
code, and how have we not noticed it before now?

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger


> On Mar 16, 2021, at 9:30 AM, Mark Dilger  wrote:
> 
> 
> 
>> On Mar 16, 2021, at 9:07 AM, Tom Lane  wrote:
>> 
>> Mark Dilger  writes:
>>> I think autovacuum simply triggers the bug, and is not the cause of the 
>>> bug.  If I turn autovacuum off and instead do an ANALYZE in each test 
>>> database rather than performing the corruptions, I get reports about 
>>> problems in pg_statistic.  This is on my mac laptop.  This rules out the 
>>> theory that autovacuum is propogating corruptions into pg_statistic, and 
>>> also the theory that it is architecture dependent.
>> 
>> I wonder whether amcheck is confused by the declaration of those columns
>> as "anyarray".
> 
> It uses attlen and attalign for the attribute, so that idea does make sense.  
> It gets that via TupleDescAttr(RelationGetDescr(rel), attnum).

Yeah, that looks related:

regression=# select attname, attlen, attnum, attalign from pg_attribute where 
attrelid = 2619 and attname like 'stavalue%';
  attname   | attlen | attnum | attalign 
+++--
 stavalues1 | -1 | 27 | d
 stavalues2 | -1 | 28 | d
 stavalues3 | -1 | 29 | d
 stavalues4 | -1 | 30 | d
 stavalues5 | -1 | 31 | d
(5 rows)

It shows them all has having attalign = 'd', but for some array types the 
alignment will be 'i', not 'd'.  So it's lying a bit about the contents.  But 
I'm now confused why this caused problems on the two hosts where integer and 
double have the same alignment?  It seems like that would be the one place 
where the bug would not happen, not the one place where it does.

I'm attaching a test that reliably reproduces this for me:



v8-0005-Adding-pg_amcheck-special-test-for-pg_statistic.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger



> On Mar 16, 2021, at 9:07 AM, Tom Lane  wrote:
> 
> Mark Dilger  writes:
>> I think autovacuum simply triggers the bug, and is not the cause of the bug. 
>>  If I turn autovacuum off and instead do an ANALYZE in each test database 
>> rather than performing the corruptions, I get reports about problems in 
>> pg_statistic.  This is on my mac laptop.  This rules out the theory that 
>> autovacuum is propogating corruptions into pg_statistic, and also the theory 
>> that it is architecture dependent.
> 
> I wonder whether amcheck is confused by the declaration of those columns
> as "anyarray".

It uses attlen and attalign for the attribute, so that idea does make sense.  
It gets that via TupleDescAttr(RelationGetDescr(rel), attnum).

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-16 Thread Tom Lane
Mark Dilger  writes:
> I think autovacuum simply triggers the bug, and is not the cause of the bug.  
> If I turn autovacuum off and instead do an ANALYZE in each test database 
> rather than performing the corruptions, I get reports about problems in 
> pg_statistic.  This is on my mac laptop.  This rules out the theory that 
> autovacuum is propogating corruptions into pg_statistic, and also the theory 
> that it is architecture dependent.

I wonder whether amcheck is confused by the declaration of those columns
as "anyarray".

regards, tom lane




Re: pg_amcheck contrib application

2021-03-16 Thread Mark Dilger



> On Mar 15, 2021, at 11:09 PM, Noah Misch  wrote:
> 
>> Not sure that I believe the theory that this is from bad luck of
>> concurrent autovacuum timing, though.
> 
> With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve
> runs.  With v6-0001-Turning-off-autovacuum-during-corruption-tests.patch
> applied, 196 runs all succeeded.
> 
>> The fact that we're seeing
>> this on just those two animals suggests strongly to me that it's
>> architecture-correlated, instead.
> 
> That is possible.

I think autovacuum simply triggers the bug, and is not the cause of the bug.  
If I turn autovacuum off and instead do an ANALYZE in each test database rather 
than performing the corruptions, I get reports about problems in pg_statistic.  
This is on my mac laptop.  This rules out the theory that autovacuum is 
propogating corruptions into pg_statistic, and also the theory that it is 
architecture dependent.

I'll investigate further.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-16 Thread Noah Misch
On Mon, Mar 15, 2021 at 02:57:20PM -0400, Tom Lane wrote:
> Mark Dilger  writes:
> > On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
> >> These animals have somewhat weird alignment properties: MAXALIGN is 8
> >> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
> >> choices about whether an out-of-line TOAST value is needed, breaking
> >> this test case.

That machine also has awful performance for filesystem metadata operations,
like open(O_CREAT).  Its CPU and read()/write() performance are normal.

> > The logic in verify_heapam only looks for a value in the toast table if
> > the tuple it gets from the main table (in this case, from pg_statistic)
> > has an attribute that claims to be toasted.  The error message we're
> > seeing that you quoted above simply means that no entry exists in the
> > toast table.
> 
> Yeah, that could be phrased better.  Do we have a strong enough lock on
> the table under examination to be sure that autovacuum couldn't remove
> a dead toast entry before we reach it?  But this would only be an
> issue if we are trying to check validity of toasted fields within
> known-dead tuples, which I would argue we shouldn't, since lock or
> no lock there's no guarantee the toast entry is still there.
> 
> Not sure that I believe the theory that this is from bad luck of
> concurrent autovacuum timing, though.

With autovacuum_naptime=1s, on hornet, the failure reproduced twice in twelve
runs.  With v6-0001-Turning-off-autovacuum-during-corruption-tests.patch
applied, 196 runs all succeeded.

> The fact that we're seeing
> this on just those two animals suggests strongly to me that it's
> architecture-correlated, instead.

That is possible.




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Not sure that I believe the theory that this is from bad luck of
> concurrent autovacuum timing, though.  The fact that we're seeing
> this on just those two animals suggests strongly to me that it's
> architecture-correlated, instead.

I find it a little hard to see how amcheck is tripping over a toasted value 
just in this one table, pg_statistic, and not in any of the others.  The error 
message says the problem is in attribute 27, which I believe means it is 
stavalues2.  The comment in the header for this catalog is intriguing:

/*
 * Values in these arrays are values of the column's data type, or of some
 * related type such as an array element type.  We presently have to cheat
 * quite a bit to allow polymorphic arrays of this kind, but perhaps
 * someday it'll be a less bogus facility.
 */
anyarraystavalues1;
anyarraystavalues2;
anyarraystavalues3;
anyarraystavalues4;
anyarraystavalues5;

This is hard to duplicate in a test, because you can't normally create tables 
with pseudo-type columns.  However, if amcheck is walking the tuple and does 
not correctly update the offset with the length of attribute 26, it may try to 
read attribute 27 at the wrong offset, unsurprisingly leading to garbage, 
perhaps a garbage toast pointer.  The attached patch v7-0004 adds a check to 
verify_heapam to see if the va_toastrelid matches the expected toast table oid 
for the table we're reading.  That check almost certainly should have been 
included in the initial version of verify_heapam, so even if it does nothing to 
help us in this issue, it's good that it be committed, I think.

It is unfortunate that the failing test only runs pg_amcheck after creating 
numerous corruptions, as we can't know if pg_amcheck would have complained 
about pg_statistic before the corruptions were created in other tables, or if 
it only does so after.  The attached patch v7-0003 adds a call to pg_amcheck 
after all tables are created and populated, but before any corruptions are 
caused.  This should help narrow down what is happening, and doesn't hurt to 
leave in place long-term.

I don't immediately see anything wrong with how pg_statistic uses a 
pseudo-type, but it leads me to want to poke a bit more at pg_statistic on 
hornet and tern, though I don't have any regression tests specifically for 
doing so.

Tests v7-0001 and v7-0002 are just repeats of the tests posted previously.



v7-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data


v7-0002-Fixing-a-confusing-amcheck-corruption-message.patch
Description: Binary data


v7-0003-Extend-pg_amcheck-test-suite.patch
Description: Binary data


v7-0004-Add-extra-check-of-toast-pointer-in-amcheck.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Yeah, that could be phrased better.

Attaching the 0001 patch submitted earlier, plus 0002 which fixes the confusing 
corruption message.




v6-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data


v6-0002-Fixing-a-confusing-amcheck-corruption-message.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 11:57 AM, Tom Lane  wrote:
> 
> Do we have a strong enough lock on
> the table under examination to be sure that autovacuum couldn't remove
> a dead toast entry before we reach it?

The main table and the toast table are only locked with AccessShareLock.  Each 
page in the main table is locked with BUFFER_LOCK_SHARE.  Toast is not checked 
unless the tuple passes visibility checks verifying the tuple is not dead.

>  But this would only be an
> issue if we are trying to check validity of toasted fields within
> known-dead tuples, which I would argue we shouldn't, since lock or
> no lock there's no guarantee the toast entry is still there.

It does not intentionally check toasted fields within dead tuples.  If that is 
happening, it's a bug, possibly in the visibility function.  But I'm not seeing 
a specific reason to assume that is the issue.  If we still see the complaint 
on tern or hornet after committing the patch to turn off autovacuum, we will be 
able to rule out the theory that the toast was removed by autovacuum.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Mark Dilger  writes:
> On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
>> These animals have somewhat weird alignment properties: MAXALIGN is 8
>> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
>> choices about whether an out-of-line TOAST value is needed, breaking
>> this test case.

> The logic in verify_heapam only looks for a value in the toast table if
> the tuple it gets from the main table (in this case, from pg_statistic)
> has an attribute that claims to be toasted.  The error message we're
> seeing that you quoted above simply means that no entry exists in the
> toast table.

Yeah, that could be phrased better.  Do we have a strong enough lock on
the table under examination to be sure that autovacuum couldn't remove
a dead toast entry before we reach it?  But this would only be an
issue if we are trying to check validity of toasted fields within
known-dead tuples, which I would argue we shouldn't, since lock or
no lock there's no guarantee the toast entry is still there.

Not sure that I believe the theory that this is from bad luck of
concurrent autovacuum timing, though.  The fact that we're seeing
this on just those two animals suggests strongly to me that it's
architecture-correlated, instead.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger


> On Mar 15, 2021, at 11:11 AM, Mark Dilger  
> wrote:
> 
> I will submit a patch that turns off autovacuum for the test node shortly.



v5-0001-Turning-off-autovacuum-during-corruption-tests.patch
Description: Binary data

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Re: pg_amcheck contrib application

2021-03-15 Thread Mark Dilger



> On Mar 15, 2021, at 10:04 AM, Tom Lane  wrote:
> 
> Looks like we're not quite out of the woods, as hornet and tern are
> still unhappy:
> 
> #   Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs 
> expected 0)'
> #   at t/003_check.pl line 498.
> 
> #   Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/'
> #   at t/003_check.pl line 498.
> #   'heap table "db1"."pg_catalog"."pg_statistic", block 2, 
> offset 1, attribute 27:
> # final toast chunk number 0 differs from expected value 1
> # heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 
> 27:
> # toasted value for attribute 27 missing from toast table
> # '
> # doesn't match '(?^:^$)'
> # Looks like you failed 2 tests of 60.
> [12:18:06] t/003_check.pl ... 
> Dubious, test returned 2 (wstat 512, 0x200)
> Failed 2/60 subtests 
> 
> These animals have somewhat weird alignment properties: MAXALIGN is 8
> but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
> choices about whether an out-of-line TOAST value is needed, breaking
> this test case.

The pg_amcheck test case is not corrupting any pg_catalog tables, but 
contrib/amcheck/verify_heapam is complaining about a corruption in 
pg_catalog.pg_statistic.

The logic in verify_heapam only looks for a value in the toast table if the 
tuple it gets from the main table (in this case, from pg_statistic) has an 
attribute that claims to be toasted.  The error message we're seeing that you 
quoted above simply means that no entry exists in the toast table.  The bit 
about "final toast chunk number 0 differs from expected value 1" is super 
unhelpful, as what it is really saying is that there were no chunks found.  I 
should submit a patch to not print that message in cases where the attribute is 
missing from the toast table.

Is it possible that pg_statistic really is corrupt here, and that this is not a 
bug in pg_amcheck?  It's not like we've been checking for corruption in the 
build farm up till now.  I notice that this test, as well as test 
005_opclass_damage.pl, neglects to turn off autovacuum for the test node.  So 
maybe the corruptions are getting propogated during autovacuum?  This is just a 
guess, but I will submit a patch that turns off autovacuum for the test node 
shortly.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-15 Thread Tom Lane
Looks like we're not quite out of the woods, as hornet and tern are
still unhappy:

#   Failed test 'pg_amcheck excluding all corrupt schemas status (got 2 vs 
expected 0)'
#   at t/003_check.pl line 498.

#   Failed test 'pg_amcheck excluding all corrupt schemas stdout /(?^:^$)/'
#   at t/003_check.pl line 498.
#   'heap table "db1"."pg_catalog"."pg_statistic", block 2, 
offset 1, attribute 27:
# final toast chunk number 0 differs from expected value 1
# heap table "db1"."pg_catalog"."pg_statistic", block 2, offset 1, attribute 27:
# toasted value for attribute 27 missing from toast table
# '
# doesn't match '(?^:^$)'
# Looks like you failed 2 tests of 60.
[12:18:06] t/003_check.pl ... 
Dubious, test returned 2 (wstat 512, 0x200)
Failed 2/60 subtests 

These animals have somewhat weird alignment properties: MAXALIGN is 8
but ALIGNOF_DOUBLE is only 4.  I speculate that that is affecting their
choices about whether an out-of-line TOAST value is needed, breaking
this test case.

regards, tom lane




Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch
On Sat, Mar 13, 2021 at 10:51:27AM -0800, Mark Dilger wrote:
> > On Mar 13, 2021, at 10:46 AM, Noah Misch  wrote:
> > On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
> >>> On Mar 12, 2021, at 3:24 PM, Mark Dilger  
> >>> wrote:
> >>> and the second deals with an apparent problem with IPC::Run shell 
> >>> expanding an asterisk on some platforms but not others.  That second one, 
> >>> if true, seems like a problem with scope beyond the pg_amcheck project, 
> >>> as TestLib::command_checks_all uses IPC::Run, and it would be desirable 
> >>> to have consistent behavior across platforms.
> > 
> >>> One of pg_amcheck's regression tests was passing an asterisk through
> >>> TestLib's command_checks_all() command, which gets through to
> >>> pg_amcheck without difficulty on most platforms, but appears to get
> >>> shell expanded on Windows (jacana) and AIX (hoverfly).
> > 
> > For posterity, I can't reproduce this on hoverfly.  The suite fails the same
> > way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite 
> > passes
> > even after reverting f371a4c.  Self-contained IPC::Run usage also does not
> > corroborate the theory:
> > 
> > [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
> > 'IPC::Run::run "printf", "%s\n", "*"'
> > *
> > [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
> > 'IPC::Run::run "sh", "-c", "printf %sn *"'
> > COPYRIGHT
> > GNUmakefile.in
> > HISTORY
> > Makefile
> > README
> > README.git
> > aclocal.m4
> > config
> > configure
> > configure.ac
> > contrib
> > doc
> > src

> The reason I suspected that passing the '*' through IPC::Run had to do with 
> the error that pg_amcheck gave.  It complained that too many arguments where 
> being passed to it, and that the first such argument was "pg_amcheck.c".  
> There's no reason pg_amcheck should know it's source file name, nor that the 
> regression test should know that, which suggested that the asterisk was being 
> shell expanded within the src/bin/pg_amcheck/ directory and the file listing 
> was being passed into pg_amcheck as arguments.

I agree.  I can reproduce the problem on Windows.  Commit f371a4c fixed
http://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=fairywren=2021-03-12%2020%3A12%3A44
and I see logs of that kind of failure only on fairywren and jacana.




Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger



> On Mar 13, 2021, at 10:46 AM, Noah Misch  wrote:
> 
> On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
>>> On Mar 12, 2021, at 3:24 PM, Mark Dilger  
>>> wrote:
>>> 
>>> and the second deals with an apparent problem with IPC::Run shell expanding 
>>> an asterisk on some platforms but not others.  That second one, if true, 
>>> seems like a problem with scope beyond the pg_amcheck project, as 
>>> TestLib::command_checks_all uses IPC::Run, and it would be desirable to 
>>> have consistent behavior across platforms.
> 
>>> One of pg_amcheck's regression tests was passing an asterisk through
>>> TestLib's command_checks_all() command, which gets through to
>>> pg_amcheck without difficulty on most platforms, but appears to get
>>> shell expanded on Windows (jacana) and AIX (hoverfly).
> 
> For posterity, I can't reproduce this on hoverfly.  The suite fails the same
> way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite passes
> even after reverting f371a4c.  Self-contained IPC::Run usage also does not
> corroborate the theory:
> 
> [nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "printf", "%s\n", "*"'
> *
> [nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
> 'IPC::Run::run "sh", "-c", "printf %sn *"'
> COPYRIGHT
> GNUmakefile.in
> HISTORY
> Makefile
> README
> README.git
> aclocal.m4
> config
> configure
> configure.ac
> contrib
> doc
> src
> 
>> there is a similar symptom caused by an unrelated problem
> 
>> Subject: [PATCH v3] Avoid use of non-portable option ordering in
>> command_checks_all().
> 
> On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.

Thanks for investigating!

The reason I suspected that passing the '*' through IPC::Run had to do with the 
error that pg_amcheck gave.  It complained that too many arguments where being 
passed to it, and that the first such argument was "pg_amcheck.c".  There's no 
reason pg_amcheck should know it's source file name, nor that the regression 
test should know that, which suggested that the asterisk was being shell 
expanded within the src/bin/pg_amcheck/ directory and the file listing was 
being passed into pg_amcheck as arguments.

That theory may have been wrong, but it was the only theory I had at the time.  
I don't have access to any of the machines where that happened, so it is hard 
for me to investigate further.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company







Re: pg_amcheck contrib application

2021-03-13 Thread Noah Misch
On Fri, Mar 12, 2021 at 05:04:09PM -0800, Mark Dilger wrote:
> > On Mar 12, 2021, at 3:24 PM, Mark Dilger  
> > wrote:
> > 
> > and the second deals with an apparent problem with IPC::Run shell expanding 
> > an asterisk on some platforms but not others.  That second one, if true, 
> > seems like a problem with scope beyond the pg_amcheck project, as 
> > TestLib::command_checks_all uses IPC::Run, and it would be desirable to 
> > have consistent behavior across platforms.

> > One of pg_amcheck's regression tests was passing an asterisk through
> > TestLib's command_checks_all() command, which gets through to
> > pg_amcheck without difficulty on most platforms, but appears to get
> > shell expanded on Windows (jacana) and AIX (hoverfly).

For posterity, I can't reproduce this on hoverfly.  The suite fails the same
way at f371a4c and f371a4c^.  More-recently (commit 58f5749), the suite passes
even after reverting f371a4c.  Self-contained IPC::Run usage also does not
corroborate the theory:

[nm@power8-aix 8:0 2021-03-13T18:32:23 clean 0]$ perl -MIPC::Run -e 
'IPC::Run::run "printf", "%s\n", "*"'
*
[nm@power8-aix 8:0 2021-03-13T18:32:29 clean 0]$ perl -MIPC::Run -e 
'IPC::Run::run "sh", "-c", "printf %sn *"'
COPYRIGHT
GNUmakefile.in
HISTORY
Makefile
README
README.git
aclocal.m4
config
configure
configure.ac
contrib
doc
src

> there is a similar symptom caused by an unrelated problem

> Subject: [PATCH v3] Avoid use of non-portable option ordering in
>  command_checks_all().

On a glibc system, "env POSIXLY_CORRECT=1 make check ..." tests this.




Re: pg_amcheck contrib application

2021-03-13 Thread Robert Haas
On Sat, Mar 13, 2021 at 10:20 AM Mark Dilger
 wrote:
> We want to use values that don't look like any of the others.  The complete 
> set of nibbles in the values above is [012345678B], leaving the set [9ACDEF] 
> unused.  The attached patch uses the value DEADF9F9 as it seems a little 
> easier to remember than other permutations of those nibbles.

OK, committed. The nibbles seem less relevant than the bytes as a
whole, but that's fine.

-- 
Robert Haas
EDB: http://www.enterprisedb.com




Re: pg_amcheck contrib application

2021-03-13 Thread Mark Dilger


> On Mar 13, 2021, at 6:50 AM, Tom Lane  wrote:
> 
> Robert Haas  writes:
>> I think it would be good to use a non-zero value here. We're doing a
>> lot of poking into raw bytes here, and if something goes wrong, a zero
>> value is more likely to look like something normal than whatever else.
>> I suggest picking a value where all 8 bytes are the same, but not
>> zero, and ideally chosen so that they don't look much like any of the
>> surrounding bytes.
> 
> Actually, it seems like we can let pack/unpack deal with byte-swapping
> within 32-bit words; what we lose by not using 'q' format is just the
> ability to correctly swap the two 32-bit words.  Hence, any value in
> which upper and lower halves are the same should work, say
> 0x1234567812345678.
> 
>   regards, tom lane

The heap tuple in question looks as follows, with  in the spot we're 
debating:

Tuple Layout (values in hex):

t_xmin: 223
t_xmax: 0
t_field3: 0
bi_hi: 0
bi_lo: 0
ip_posid: 1
t_infomask2: 3
t_infomask: b06
t_hoff: 18
t_bits: 0
a_1: 
a_2: 
b_header: 11# little-endian, will be 88 on big endian
b_body1: 61
b_body2: 62
b_body3: 63
b_body4: 64
b_body5: 65
b_body6: 66
b_body7: 67
c_va_header: 1
c_va_vartag: 12
c_va_rawsize: 2714
c_va_extsize: 2710

valueid and toastrelid are not shown, as they won't be stable.  Relying on 
t_xmin to be stable makes the test brittle, but fortunately that is separated 
from a_1 and a_2 far enough that we should not need to worry about it.

We want to use values that don't look like any of the others.  The complete set 
of nibbles in the values above is [012345678B], leaving the set [9ACDEF] 
unused.  The attached patch uses the value DEADF9F9 as it seems a little easier 
to remember than other permutations of those nibbles.



v6-0001-pg_amcheck-continuing-to-fix-portability-problems.patch
Description: Binary data


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





  1   2   >