On Wed, Jul 12, 2023 at 8:52 PM Justin Pryzby <pry...@telsasoft.com> wrote: > > On Mon, Jul 10, 2023 at 09:01:37PM -0500, Justin Pryzby wrote: > > An instance compiled locally, without assertions, failed like this: > > > ... > > > > => REINDEX was running, with parallel workers, but deadlocked with > > ANALYZE, and then crashed. > > > > It looks like parallel workers are needed to hit this issue. > > I'm not sure if the issue is specific to extended stats - probably not. > > > > I reproduced the crash with manual REINDEX+ANALYZE, and with assertions > > (which > > were not hit), and on a more recent commit (1124cb2cf). The crash is hit > > about > > 30% of the time when running a loop around REINDEX and then also running > > ANALYZE. > > > > I hope someone has a hunch where to look; so far, I wasn't able to create a > > minimal reproducer. > > I was able to reproduce this in isolation by reloading data into a test > instance, ANALYZEing the DB to populate pg_statistic_ext_data (so it's > over 3MB in size), and then REINDEXing the stats_ext index in a loop > while ANALYZEing a table with extended stats. > > I still don't have a minimal reproducer, but on a hunch I found that > this fails at 5764f611e but not its parent. > > commit 5764f611e10b126e09e37fdffbe884c44643a6ce > Author: Andres Freund <and...@anarazel.de> > Date: Wed Jan 18 11:41:14 2023 -0800 > > Use dlist/dclist instead of PROC_QUEUE / SHM_QUEUE for heavyweight locks >
Good catch. I didn't realize this email but while investigating the same issue that has been reported recently[1], I reached the same commit. I've sent my analysis and a patch to fix this issue there. Andres, since this issue seems to be relevant with your commit 5764f611e, could you please look at this issue and my patch? Regards, [1] https://www.postgresql.org/message-id/CAD21AoDs7vzK7NErse7xTruqY-FXmM%2B3K00SdBtMcQhiRNkoeQ%40mail.gmail.com -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com