Hello,
We encountered a query that has been able to frequently segfault one of our
postgres instances under certain conditions which we have not fully been able
to isolate for reproduction. We were able to get a core dump out of one of the
crashes and have poked at it, but we believe the answer is beyond our knowledge
of postgres internals. This is on a 9.3.19 server and we saw no mention of a
fix in the release notes since this version and we do not know if it affects
later major releases as well.
What we know so far is that somehow the arraycontains function was given a
datum of 0 as the second argument that dereferenced to a null pointer. Our
current hypothesis from poking at the core dump is that some memory context is
getting freed before it should. This assumption comes from the complexity in
the query (CTE containing params being repeatedly evaluated by multiple case
statements) and the unpredictability of the failure case.
The issue is easily avoidable, and we have asked the developer to solve their
problem differently. However, the existence of a segfault is always concerning
and we are reporting this issue in an effort to be conscientiousness members of
the community.
Due to the potentially sensitive contents we cannot provide the core directly,
but we are happy to run commands against the core file to extract debugging
information. We have also replaced certain values (database name, table name,
column name) with generic identifiers.
version
----------------------------------------------------------------------------------------------------------------
PostgreSQL 9.3.19 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.7
20120313 (Red Hat 4.4.7-18), 64-bit
(1 row)
Query:
WITH tmp(foo2, bar2, baz2, minutes2) AS
(
SELECT ARRAY[$1 ::text], ARRAY[$2 ::text], ARRAY[$3 ::text], $4 ::double
precision
)
UPDATE table_name
SET occurrences = CASE
WHEN time_last_noticed >= current_timestamp - (SELECT
minutes2 FROM tmp) * interval '1 minutes'
THEN occurrences + 1
ELSE 1
END,
foo = CASE
WHEN time_last_noticed < current_timestamp - (SELECT minutes2
FROM tmp) * interval '1 minutes'
THEN (SELECT foo2 FROM tmp)
WHEN foo @> (SELECT foo2 FROM tmp)
THEN foo
ELSE array_cat(foo, (SELECT foo2 FROM tmp))
END,
bar = CASE
WHEN time_last_noticed < current_timestamp - (SELECT minutes2
FROM tmp) * interval '1 minutes'
THEN (SELECT bar2 FROM tmp)
WHEN bar @> (SELECT bar2 FROM tmp)
THEN bar
ELSE array_cat(bar, (SELECT bar2 FROM tmp))
END,
baz = CASE
WHEN time_last_noticed < current_timestamp - (SELECT
minutes2 FROM tmp) * interval '1 minutes'
THEN (SELECT baz2 FROM tmp)
WHEN baz @> (SELECT baz2 FROM tmp)
THEN baz
ELSE array_cat(baz, (SELECT baz2 FROM tmp))
END,
time_last_noticed = current_timestamp,
total_occurrences = total_occurrences + 1
WHERE id1 = $5 AND id2 = $6;
Table schema:
Table "public.table_name"
Column | Type | Modifiers | Storage |
Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+-------------
id1 | text | not null | extended |
|
id2 | text | not null | extended |
|
occurrences | integer | not null | plain |
|
time_last_noticed | timestamp without time zone | not null | plain |
|
total_occurrences | integer | not null | plain |
|
bar | text[] | | extended |
|
baz | text[] | | extended |
|
foo | text[] | | extended |
|
Indexes:
"table_name_pkey" PRIMARY KEY, btree (id1, id2)
Back trace from the dump:
(gdb) bt
#0 pg_detoast_datum (datum=0x0) at fmgr.c:2241
#1 0x000000000067cd90 in arraycontains (fcinfo=0x2c56e30) at
arrayfuncs.c:3841
#2 0x000000000058ba25 in ExecMakeFunctionResultNoSets (fcache=0x2c56dc0,
econtext=0x2c500c0, isNull=0x7ffc571d9a3f "", isDone=<value optimized out>) at
execQual.c:2027
#3 0x0000000000587375 in ExecEvalCase (caseExpr=0x2c54a30,
econtext=0x2c500c0, isNull=0x2c60967 "", isDone=0x2c60c4c) at execQual.c:2985
#4 0x00000000005878c3 in ExecTargetList (projInfo=<value optimized out>,
isDone=0x7ffc571d9b0c) at execQual.c:5322
#5 ExecProject (projInfo=<value optimized out>, isDone=0x7ffc571d9b0c) at
execQual.c:5537
#6 0x000000000058dd62 in ExecScan (node=0x2c4ffb0, accessMtd=0x599340
<IndexNext>, recheckMtd=0x5992f0 <IndexRecheck>) at execScan.c:207
#7 0x0000000000586e28 in ExecProcNode (node=0x2c4ffb0) at
execProcnode.c:404
#8 0x0000000000584472 in EvalPlanQualNext (epqstate=<value optimized out>)
at execMain.c:2366
#9 0x0000000000584b47 in EvalPlanQual (estate=0x2bdd240,
epqstate=0x2c0d638, relation=<value optimized out>, rti=1, lockmode=<value
optimized out>, tid=0x7ffc571d9c60, priorXmax=2185728490) at execMain.c:1951
#10 0x000000000059d7cb in ExecUpdate (node=0x2c0d598) at
nodeModifyTable.c:727
#11 ExecModifyTable (node=0x2c0d598) at nodeModifyTable.c:997
#12 0x0000000000586e78 in ExecProcNode (node=0x2c0d598) at
execProcnode.c:377
#13 0x0000000000585a82 in ExecutePlan (queryDesc=0x2bdd000,
direction=<value optimized out>, count=0) at execMain.c:1488
#14 standard_ExecutorRun (queryDesc=0x2bdd000, direction=<value optimized
out>, count=0) at execMain.c:318
#15 0x00007f2e5f3b358b in explain_ExecutorRun (queryDesc=0x2bdd000,
direction=ForwardScanDirection, count=0) at auto_explain.c:231
#16 0x00007f2e5f1af495 in pgss_ExecutorRun (queryDesc=0x2bdd000,
direction=ForwardScanDirection, count=0) at pg_stat_statements.c:716
#17 0x000000000066553f in ProcessQuery (plan=0x2be32b0,
sourceText=0x2be7fe0 "WITH tmp(foo2, bar2, baz2, minutes2) AS ( SELECT ARRAY[$1
::text], ARRAY[$2 ::text], ARRAY[$3 ::text], $4 ::double precision ) UPDATE
table_name SET occurrences = CAS"..., params=0x2bdce30, dest=<value optimized
out>, completionTag=0x7ffc571da2f0 "") at pquery.c:185
#18 0x000000000066576f in PortalRunMulti (portal=0x2bd2530, isTopLevel=1
'\001', dest=0xb1d380, altdest=0xb1d380, completionTag=0x7ffc571da2f0 "") at
pquery.c:1275
#19 0x0000000000665e32 in PortalRun (portal=0x2bd2530,
count=9223372036854775807, isTopLevel=1 '\001', dest=0x2ba30b0,
altdest=0x2ba30b0, completionTag=0x7ffc571da2f0 "") at pquery.c:812
#20 0x000000000066409d in exec_execute_message (argc=<value optimized out>,
argv=<value optimized out>, dbname=0x2965280 "database_name", username=<value
optimized out>) at postgres.c:1958
#21 PostgresMain (argc=<value optimized out>, argv=<value optimized out>,
dbname=0x2965280 "database_name", username=<value optimized out>) at
postgres.c:4154
#22 0x000000000061df38 in BackendRun (argc=<value optimized out>,
argv=<value optimized out>) at postmaster.c:4194
#23 BackendStartup (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:3857
#24 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at
postmaster.c:1702
#25 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>)
at postmaster.c:1326
#26 0x00000000005b92f8 in main (argc=5, argv=0x2964470) at main.c:234
fcinfo contents from the arraycontains in frame #1
(gdb) print *fcinfo
$1 = {flinfo = 0x2c56de0, context = 0x0, resultinfo = 0x0, fncollation =
100, isnull = 0 '\000', nargs = 2, arg = {46431788, 0 <repeats 99 times>},
argnull = '\000' <repeats 99 times>}
- Kyle Samson