Re: [HACKERS] [sqlsmith] Crash in gather_readnext

2016-12-05 Thread Robert Haas
On Mon, Dec 5, 2016 at 3:07 PM, Andreas Seltenreich  wrote:
> on master as of a0ae54d, there's a 1 in 10e6 chance sqlsmith catches
> gather_readnext reading beyond the gatherstate->readers array with
> readers[gatherstate->readnext].  Sample backtrace below.
>
> As readnext is never explicitly initialized, I think what happens is
> that a rescan gets less workers than the initial scan, and the dangling
> readnext points outside the array.  I'm no longer seeing these crashes
> when explicitly initializing readnext to 0 like in the attached patch.

Thanks, great catch!  Committed and back-patched to 9.6.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [sqlsmith] Crash in gather_readnext

2016-12-05 Thread Andreas Seltenreich
Hi,

on master as of a0ae54d, there's a 1 in 10e6 chance sqlsmith catches
gather_readnext reading beyond the gatherstate->readers array with
readers[gatherstate->readnext].  Sample backtrace below.

As readnext is never explicitly initialized, I think what happens is
that a rescan gets less workers than the initial scan, and the dangling
readnext points outside the array.  I'm no longer seeing these crashes
when explicitly initializing readnext to 0 like in the attached patch.

regards,
Andreas

Program terminated with signal SIGSEGV, Segmentation fault.
#0  shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, 
datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520
520 shm_mq *mq = mqh->mqh_queue;
(gdb) bt
#0  shm_mq_receive (mqh=0x259, nbytesp=nbytesp@entry=0x7ffc55ce0580, 
datap=datap@entry=0x7ffc55ce0588, nowait=nowait@entry=1 '\001') at shm_mq.c:520
#1  0x0060b8b7 in TupleQueueReaderNext (reader=reader@entry=0x5446c10, 
nowait=nowait@entry=1 '\001', done=done@entry=0x7ffc55ce065b "") at tqueue.c:692
#2  0x005f5e03 in gather_readnext (gatherstate=0x52a9918) at 
nodeGather.c:339
#3  gather_getnext (gatherstate=0x52a9918) at nodeGather.c:292
#4  ExecGather (node=node@entry=0x52a9918) at nodeGather.c:233
#5  0x005e3b68 in ExecProcNode (node=0x52a9918) at execProcnode.c:515
#6  0x005eb2f2 in ExecScanFetch (recheckMtd=0x605e40 , 
accessMtd=0x605e50 , node=0x52a86c0) at execScan.c:95
#7  ExecScan (node=node@entry=0x52a86c0, accessMtd=accessMtd@entry=0x605e50 
, recheckMtd=recheckMtd@entry=0x605e40 ) at 
execScan.c:180
#8  0x00605e6f in ExecSubqueryScan (node=node@entry=0x52a86c0) at 
nodeSubqueryscan.c:85
#9  0x005e3c68 in ExecProcNode (node=node@entry=0x52a86c0) at 
execProcnode.c:445
#10 0x006001d6 in ExecNestLoop (node=node@entry=0x52a7978) at 
nodeNestloop.c:123
#11 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a7978) at 
execProcnode.c:476
#12 0x006001d6 in ExecNestLoop (node=node@entry=0x52a5120) at 
nodeNestloop.c:123
#13 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a5120) at 
execProcnode.c:476
#14 0x006001d6 in ExecNestLoop (node=node@entry=0x52a3d50) at 
nodeNestloop.c:123
#15 0x005e3bf8 in ExecProcNode (node=0x52a3d50) at execProcnode.c:476
#16 0x006015e5 in ExecResult (node=node@entry=0x52a3140) at 
nodeResult.c:130
#17 0x005e3d18 in ExecProcNode (node=node@entry=0x52a3140) at 
execProcnode.c:392
#18 0x005fb360 in ExecLimit (node=node@entry=0x52a2e70) at 
nodeLimit.c:91
#19 0x005e3af8 in ExecProcNode (node=node@entry=0x52a2e70) at 
execProcnode.c:531
#20 0x00600299 in ExecNestLoop (node=node@entry=0x52a1a10) at 
nodeNestloop.c:174
#21 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a1a10) at 
execProcnode.c:476
#22 0x006001d6 in ExecNestLoop (node=node@entry=0x52a16d0) at 
nodeNestloop.c:123
#23 0x005e3bf8 in ExecProcNode (node=node@entry=0x52a16d0) at 
execProcnode.c:476
#24 0x005dfdae in ExecutePlan (dest=0x50cbb00, direction=, numberTuples=0, sendTuples=, operation=CMD_SELECT, 
use_parallel_mode=, planstate=0x52a16d0, estate=0x3610968) at 
execMain.c:1567
#25 standard_ExecutorRun (queryDesc=0x36805b8, direction=, 
count=0) at execMain.c:338
#26 0x00701a58 in PortalRunSelect (portal=portal@entry=0x529da38, 
forward=forward@entry=1 '\001', count=0, count@entry=9223372036854775807, 
dest=dest@entry=0x50cbb00) at pquery.c:946
#27 0x0070300e in PortalRun (portal=portal@entry=0x529da38, 
count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=1 '\001', 
dest=dest@entry=0x50cbb00, altdest=altdest@entry=0x50cbb00, 
completionTag=completionTag@entry=0x7ffc55ce0ed0 "") at pquery.c:787
#28 0x00700869 in exec_simple_query (query_string=0x45d3028 "select 
...") at postgres.c:1094
#29 PostgresMain (argc=, argv=argv@entry=0x23ce878, 
dbname=, username=) at postgres.c:4069
#30 0x0046d9d9 in BackendRun (port=0x23d1ad0) at postmaster.c:4271
#31 BackendStartup (port=0x23d1ad0) at postmaster.c:3945
#32 ServerLoop () at postmaster.c:1701
#33 0x00698ed9 in PostmasterMain (argc=argc@entry=4, 
argv=argv@entry=0x23a05c0) at postmaster.c:1309
#34 0x0046ebbd in main (argc=4, argv=0x23a05c0) at main.c:228

>From be80954688c406122b560161192cc1d2e64e3757 Mon Sep 17 00:00:00 2001
From: Andreas Seltenreich 
Date: Mon, 5 Dec 2016 20:46:28 +0100
Subject: [PATCH] Fix potential crash on ReScanGather.

Initialize gatherstate->nextreader to 0 in order to prevent a crash
when ReScanGather gets less workers than the original scan, leading to
nextreader pointing outside the readers[nworkers] array.
---
 src/backend/executor/nodeGather.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/src/backend/executor/nodeGather.c b/src/backend/executor/nodeGather.c
index 880ca62..2bdf223 100644
--- a/src/backend/executor/nodeGather.c
+++ b/src/backend/executor/n