On Wed, Dec 13, 2017 at 7:02 AM, Thomas Munro <thomas.mu...@enterprisedb.com> wrote: > Hi, > > Here's a reproducer which enabled me to reach this stuck state: > > pid | wait_event | query > -------+---------------+----------------------------------------------------------------------------- > 64617 | | select pid, wait_event, query from > pg_stat_activity where state = 'active'; > 64619 | BufferPin | VACUUM jobs > 64620 | ExecuteGather | SELECT COUNT(*) FROM jobs > 64621 | ExecuteGather | SELECT COUNT(*) FROM jobs > 64622 | ExecuteGather | SELECT COUNT(*) FROM jobs > 64623 | ExecuteGather | SELECT COUNT(*) FROM jobs > 84167 | BtreePage | SELECT COUNT(*) FROM jobs > 84168 | BtreePage | SELECT COUNT(*) FROM jobs > 96440 | | SELECT COUNT(*) FROM jobs > 96438 | | SELECT COUNT(*) FROM jobs > 96439 | | SELECT COUNT(*) FROM jobs > (11 rows) > > The main thread deletes stuff in the middle of the key range (not sure > if this is important) and vacuum in a loop, and meanwhile 4 threads > (probably not important, might as well be 1) run Parallel Index Scans > over the whole range, in the hope of hitting the interesting case. In > the locked-up case I just saw now opaque->btpo_flags had the > BTP_DELETED bit set, not BTP_HALF_DEAD (I could tell because I added > logging). >
Good. I hope that the patch I have posted above is able to resolve this problem. I am asking as you haven't explicitly mentioned that. > Clearly pages are periodically being marked half-dead but I > haven't yet managed to get an index scan to hit one of those. > I think Kuntal has already able to hit that case, so maybe that is enough. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com