> On Mar 8, 2021, at 8:26 AM, Robert Haas <robertmh...@gmail.com> wrote:
> 
> On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger <mark.dil...@enterprisedb.com> 
> wrote:
>> I think Robert mistook why I was doing that.  I was thinking about a 
>> different usage pattern.  If somebody thinks a subset of relations have been 
>> badly corrupted, but doesn't know which relations those might be, they might 
>> try to find them with pg_amcheck, but wanting to just check the first few 
>> blocks per relation in order to sample the relations.  So,
>> 
>>  pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
>> 
>> or something like that.  I don't think it's very fun to have it error out 
>> for each relation that doesn't have at least ten blocks, nor is it fun to 
>> have those relations skipped by error'ing out before checking any blocks, as 
>> they might be the corrupt relations you are looking for.  But using 
>> --startblock and --endblock for this is not a natural fit, as evidenced by 
>> how I was trying to "fix things up" for the user, so I'll punt on this usage 
>> until some future version, when I might add a sampling option.
> 
> I admit I hadn't thought of that use case. I guess somebody could want
> to do that, but it doesn't seem all that useful. Checking the first
> up-to-ten blocks of every relation is not a very representative
> sample, and it's not clear to me that sampling is a good idea even if
> it were representative. What good is it to know that 10% of my
> database is probably not corrupted?


`cd $PGDATA; tar xfz my_csv_data.tgz` ctrl-C ctrl-C ctrl-C
`rm -rf $PGDATA` ctrl-C ctrl-C ctrl-C
`/my/stupid/backup/and/restore/script.sh` ctrl-C ctrl-C ctrl-C

# oh wow, i wonder if any relations got overwritten with csv file data, or had 
their relation files unlinked, or ...?

`pg_amcheck --jobs=8 --startblock=0 --endblock=10`

# ah, darn, it's spewing lots of irrelevant errors because some relations are 
too short

`pg_amcheck --jobs=8 --startblock=0 --endblock=0`

# ah, darn, it's still spewing lots of irrelevant errors because I have lots of 
indexes with zero blocks of data

`pg_amcheck --jobs=8`

# ah, darn, it's taking forever, because it's processing huge tables in their 
entirety

I agree this can be left to later, and the --startblock and --endblock options 
are the wrong way to do it.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company





Reply via email to