Hi, on master as of a0ae54d, there's a 1 in 10e6 chance sqlsmith catches gather_readnext reading beyond the gatherstate->readers array with readers[gatherstate->readnext]. Sample backtrace below.
As readnext is never explicitly initialized, I think what happens is that a rescan gets less workers than the initial scan, and the dangling readnext points outside the array. I'm no longer seeing these crashes when explicitly initializing readnext to 0 like in the attached patch. regards, Andreas Program terminated with signal SIGSEGV, Segmentation fault. #0 shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520 520 shm_mq *mq = mqh->mqh_queue; (gdb) bt #0 shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520 #1 0x000000000060b8b7 in TupleQueueReaderNext (reader=reader@entry=0x5446c10, nowait=nowait@entry=1 '\001', done=done@entry=0x7ffc55ce065b "") at tqueue.c:692 #2 0x00000000005f5e03 in gather_readnext (gatherstate=0x52a9918) at nodeGather.c:339 #3 gather_getnext (gatherstate=0x52a9918) at nodeGather.c:292 #4 ExecGather (node=node@entry=0x52a9918) at nodeGather.c:233 #5 0x00000000005e3b68 in ExecProcNode (node=0x52a9918) at execProcnode.c:515 #6 0x00000000005eb2f2 in ExecScanFetch (recheckMtd=0x605e40 <SubqueryRecheck>, accessMtd=0x605e50 <SubqueryNext>, node=0x52a86c0) at execScan.c:95 #7 ExecScan (node=node@entry=0x52a86c0, accessMtd=accessMtd@entry=0x605e50 <SubqueryNext>, recheckMtd=recheckMtd@entry=0x605e40 <SubqueryRecheck>) at execScan.c:180 #8 0x0000000000605e6f in ExecSubqueryScan (node=node@entry=0x52a86c0) at nodeSubqueryscan.c:85 #9 0x00000000005e3c68 in ExecProcNode (node=node@entry=0x52a86c0) at execProcnode.c:445 #10 0x00000000006001d6 in ExecNestLoop (node=node@entry=0x52a7978) at nodeNestloop.c:123 #11 0x00000000005e3bf8 in ExecProcNode (node=node@entry=0x52a7978) at execProcnode.c:476 #12 0x00000000006001d6 in ExecNestLoop (node=node@entry=0x52a5120) at nodeNestloop.c:123 #13 0x00000000005e3bf8 in ExecProcNode (node=node@entry=0x52a5120) at execProcnode.c:476 #14 0x00000000006001d6 in ExecNestLoop (node=node@entry=0x52a3d50) at nodeNestloop.c:123 #15 0x00000000005e3bf8 in ExecProcNode (node=0x52a3d50) at execProcnode.c:476 #16 0x00000000006015e5 in ExecResult (node=node@entry=0x52a3140) at nodeResult.c:130 #17 0x00000000005e3d18 in ExecProcNode (node=node@entry=0x52a3140) at execProcnode.c:392 #18 0x00000000005fb360 in ExecLimit (node=node@entry=0x52a2e70) at nodeLimit.c:91 #19 0x00000000005e3af8 in ExecProcNode (node=node@entry=0x52a2e70) at execProcnode.c:531 #20 0x0000000000600299 in ExecNestLoop (node=node@entry=0x52a1a10) at nodeNestloop.c:174 #21 0x00000000005e3bf8 in ExecProcNode (node=node@entry=0x52a1a10) at execProcnode.c:476 #22 0x00000000006001d6 in ExecNestLoop (node=node@entry=0x52a16d0) at nodeNestloop.c:123 #23 0x00000000005e3bf8 in ExecProcNode (node=node@entry=0x52a16d0) at execProcnode.c:476 #24 0x00000000005dfdae in ExecutePlan (dest=0x50cbb00, direction=<optimized out>, numberTuples=0, sendTuples=<optimized out>, operation=CMD_SELECT, use_parallel_mode=<optimized out>, planstate=0x52a16d0, estate=0x3610968) at execMain.c:1567 #25 standard_ExecutorRun (queryDesc=0x36805b8, direction=<optimized out>, count=0) at execMain.c:338 #26 0x0000000000701a58 in PortalRunSelect (portal=portal@entry=0x529da38, forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, dest=dest@entry=0x50cbb00) at pquery.c:946 #27 0x000000000070300e in PortalRun (portal=portal@entry=0x529da38, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', dest=dest@entry=0x50cbb00, altdest=altdest@entry=0x50cbb00, completionTag=completionTag@entry=0x7ffc55ce0ed0 "") at pquery.c:787 #28 0x0000000000700869 in exec_simple_query (query_string=0x45d3028 "select ...") at postgres.c:1094 #29 PostgresMain (argc=<optimized out>, argv=argv@entry=0x23ce878, dbname=<optimized out>, username=<optimized out>) at postgres.c:4069 #30 0x000000000046d9d9 in BackendRun (port=0x23d1ad0) at postmaster.c:4271 #31 BackendStartup (port=0x23d1ad0) at postmaster.c:3945 #32 ServerLoop () at postmaster.c:1701 #33 0x0000000000698ed9 in PostmasterMain (argc=argc@entry=4, argv=argv@entry=0x23a05c0) at postmaster.c:1309 #34 0x000000000046ebbd in main (argc=4, argv=0x23a05c0) at main.c:228
>From be80954688c406122b560161192cc1d2e64e3757 Mon Sep 17 00:00:00 2001 From: Andreas Seltenreich <seltenre...@gmx.de> Date: Mon, 5 Dec 2016 20:46:28 +0100 Subject: [PATCH] Fix potential crash on ReScanGather. Initialize gatherstate->nextreader to 0 in order to prevent a crash when ReScanGather gets less workers than the original scan, leading to nextreader pointing outside the readers[nworkers] array. --- src/backend/executor/nodeGather.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c index 880ca62..2bdf223 100644 --- a/src/backend/executor/nodeGather.c +++ b/src/backend/executor/nodeGather.c @@ -173,6 +173,7 @@ ExecGather(GatherState *node) if (pcxt->nworkers_launched > 0) { node->nreaders = 0; + node->nextreader = 0; node->reader = palloc(pcxt->nworkers_launched * sizeof(TupleQueueReader *)); @@ -335,6 +336,7 @@ gather_readnext(GatherState *gatherstate) CHECK_FOR_INTERRUPTS(); /* Attempt to read a tuple, but don't block if none is available. */ + Assert(gatherstate->nextreader < gatherstate->nreaders); reader = gatherstate->reader[gatherstate->nextreader]; tup = TupleQueueReaderNext(reader, true, &readerdone); -- 2.10.2
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers