On Mon, Jun 5, 2017 at 10:40 AM, Andrew Dunstan <andrew.duns...@2ndquadrant.com> wrote: > Buildfarm member lorikeet is failing occasionally with a failed > assertion during the select_parallel regression tests like this: > > > 2017-06-03 05:12:37.382 EDT [59327d84.1160:38] LOG: statement: select > count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0; > TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: > "/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/shm_mq.c", > Line: 221) > > I'll see if I can find out why, but if anyone has any ideas why this might be > happening (started about 3 weeks ago) that would be helpful.
I don't *think* we've made any relevant code changes lately. The only thing that I can see as looking at all relevant is b6dd1271281ce856ab774fc0b491a92878e3b501, but that doesn't really seem like it can be to blame. One thought is that the only places where shm_mq_set_sender() should be getting invoked during the main regression tests are ParallelWorkerMain() and ExecParallelGetReceiver, and both of those places using ParallelWorkerNumber to figure out what address to pass. So if ParallelWorkerNumber were getting set to the same value in two different parallel workers - e.g. because the postmaster went nuts and launched two processes instead of only one - or if ParallelWorkerNumber were not getting initialized at all or were getting initialized to some completely bogus value, it could cause this symptom. What ought to be happening, if there are N workers launched by a parallel query, is that ParallelWorkerNumber should be different in every worker, over the range 0 to N-1. I think if I were you my first step would be to verify that ParallelWorkerNumber is in that range in the crashed backend, and if it is, my second step would be to add some debugging code to ParallelWorkerMain() to print it out in every worker that gets launched and make sure they're all in range and different. All of the above might be going in the wrong direction entirely, but it's the first thing that comes to mind for me. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers