> On Mar 8, 2021, at 8:26 AM, Robert Haas <robertmh...@gmail.com> wrote:
>
> On Thu, Mar 4, 2021 at 5:39 PM Mark Dilger <mark.dil...@enterprisedb.com>
> wrote:
>> I think Robert mistook why I was doing that. I was thinking about a
>> different usage pattern. If somebody thinks a subset of relations have been
>> badly corrupted, but doesn't know which relations those might be, they might
>> try to find them with pg_amcheck, but wanting to just check the first few
>> blocks per relation in order to sample the relations. So,
>>
>> pg_amcheck --startblock=0 --endblock=9 --no-dependent-indexes
>>
>> or something like that. I don't think it's very fun to have it error out
>> for each relation that doesn't have at least ten blocks, nor is it fun to
>> have those relations skipped by error'ing out before checking any blocks, as
>> they might be the corrupt relations you are looking for. But using
>> --startblock and --endblock for this is not a natural fit, as evidenced by
>> how I was trying to "fix things up" for the user, so I'll punt on this usage
>> until some future version, when I might add a sampling option.
>
> I admit I hadn't thought of that use case. I guess somebody could want
> to do that, but it doesn't seem all that useful. Checking the first
> up-to-ten blocks of every relation is not a very representative
> sample, and it's not clear to me that sampling is a good idea even if
> it were representative. What good is it to know that 10% of my
> database is probably not corrupted?
`cd $PGDATA; tar xfz my_csv_data.tgz` ctrl-C ctrl-C ctrl-C
`rm -rf $PGDATA` ctrl-C ctrl-C ctrl-C
`/my/stupid/backup/and/restore/script.sh` ctrl-C ctrl-C ctrl-C
# oh wow, i wonder if any relations got overwritten with csv file data, or had
their relation files unlinked, or ...?
`pg_amcheck --jobs=8 --startblock=0 --endblock=10`
# ah, darn, it's spewing lots of irrelevant errors because some relations are
too short
`pg_amcheck --jobs=8 --startblock=0 --endblock=0`
# ah, darn, it's still spewing lots of irrelevant errors because I have lots of
indexes with zero blocks of data
`pg_amcheck --jobs=8`
# ah, darn, it's taking forever, because it's processing huge tables in their
entirety
I agree this can be left to later, and the --startblock and --endblock options
are the wrong way to do it.
—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company