On Mon, Jun 5, 2017 at 10:40 AM, Andrew Dunstan
<andrew.duns...@2ndquadrant.com> wrote:
> Buildfarm member lorikeet is failing occasionally with a failed
> assertion during the select_parallel regression tests like this:
>     2017-06-03 05:12:37.382 EDT [59327d84.1160:38] LOG:  statement: select 
> count(*) from tenk1, tenk2 where tenk1.hundred > 1 and tenk2.thousand=0;
>     TRAP: FailedAssertion("!(vmq->mq_sender == ((void *)0))", File: 
> "/home/andrew/bf64/root/HEAD/pgsql.build/../pgsql/src/backend/storage/ipc/shm_mq.c",
>  Line: 221)
> I'll see if I can find out why, but if anyone has any ideas why this might be 
> happening (started about 3 weeks ago) that would be helpful.

I don't *think* we've made any relevant code changes lately.  The only
thing that I can see as looking at all relevant is
b6dd1271281ce856ab774fc0b491a92878e3b501, but that doesn't really seem
like it can be to blame.

One thought is that the only places where shm_mq_set_sender() should
be getting invoked during the main regression tests are
ParallelWorkerMain() and ExecParallelGetReceiver, and both of those
places using ParallelWorkerNumber to figure out what address to pass.
So if ParallelWorkerNumber were getting set to the same value in two
different parallel workers - e.g. because the postmaster went nuts and
launched two processes instead of only one - or if
ParallelWorkerNumber were not getting initialized at all or were
getting initialized to some completely bogus value, it could cause
this symptom.

What ought to be happening, if there are N workers launched by a
parallel query, is that ParallelWorkerNumber should be different in
every worker, over the range 0 to N-1.  I think if I were you my first
step would be to verify that ParallelWorkerNumber is in that range in
the crashed backend, and if it is, my second step would be to add some
debugging code to ParallelWorkerMain() to print it out in every worker
that gets launched and make sure they're all in range and different.

All of the above might be going in the wrong direction entirely, but
it's the first thing that comes to mind for me.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to