On Fri, Jun 10, 2016 at 1:11 AM, Robert Haas <robertmh...@gmail.com> wrote:
> On Thu, Jun 9, 2016 at 5:48 AM, Amit Kapila <amit.kapil...@gmail.com> wrote:
>> Attached patch implements the above 2 functions.  I have addressed the
>> comments by Sawada San and you in latest patch and updated the documentation
>> as well.
> I made a number of changes to this patch.  Here is the new version.
> 1. The algorithm you were using for growing the array size is unsafe
> and can easily overrun the array.  Suppose that each of the first two
> pages have some corrupt tuples, more than 50% of MaxHeapTuplesPerPage
> but less than the full value of MaxTuplesPerPage.  Your code will
> conclude that the array does need to be enlarged after processing the
> first page.  I switched this to what I consider the normal coding
> pattern for such problems.
> 2. The all-visible checks seemed to me to be incorrect and incomplete.
> I made the check match the logic in lazy_scan_heap.
> 3. Your 1.0 -> 1.1 upgrade script was missing copies of the REVOKE
> statements you added to the 1.1 script.  I added them.
> 4. The tests as written were not safe under concurrency; they could
> return spurious results if the page changed between the time you
> checked the visibility map and the time you actually examined the
> tuples.  I think people will try running these functions on live
> systems, so I changed the code to recheck the VM bits after locking
> the page.  Unfortunately, there's either still a concurrency-related
> problem here or there's a bug in the all-frozen code itself because I
> once managed to get pg_check_frozen('pgbench_accounts') to return a
> TID while pgbench was running concurrently.  That's a bit alarming,
> but since I can't reproduce it I don't really have a clue how to track
> down the problem.
> 5. I made various cosmetic improvements.
> If there are not objections, I will go ahead and commit this tomorrow,
> because even if there is a bug (see point #4 above) I think it's
> better to have this in the tree than not.  However, code review and/or
> testing with these new functions seems like it would be an extremely
> good idea.

Thank you for working on this.
Here are some minor comments.

+ * Return the TIDs of not-all-visible tuples in pages marked all-visible

If there is even one non-visible tuple in pages marked all-visible,
the database might be corrupted.
Is it better "not-visible" or "non-visible" instead of "not-all-visible"?
Do we need to check page header flag?
I think that database also might be corrupt in case where there is
non-visible tuple in page set PD_ALL_VISIBLE.
We could emit the WARNING log in such case.

Also, using attached tool which allows us to set spurious visibility
map status without actual modifying the tuple , I manually made the
some situations where database is corrupted and tested it, but ISTM
that this tool works fine.
It doesn't mean proposing as a new feature of course, but please use
it as appropriate.


Masahiko Sawada

Attachment: corrupted_test.sql
Description: Binary data

Attachment: set_spurious_vm_status.patch
Description: binary/octet-stream

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to