One of our servers crashed last night like this:
< 2019-10-10 22:31:02.186 EDT postgres >STATEMENT: REINDEX INDEX CONCURRENTLY
child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
< 2019-10-10 22:31:02.399 EDT >LOG: server process (PID 29857) was terminated
by signal 11: Segmentation fault
< 2019-10-10 22:31:02.399 EDT >DETAIL: Failed process was running: REINDEX
INDEX CONCURRENTLY child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
< 2019-10-10 22:31:02.399 EDT >LOG: terminating any other active server
processes
ts=# \d+ child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
Index "child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx"
Column | Type | Key? | Definition | Storage | Stats target
---------+---------+------+------------+---------+--------------
site_id | integer | yes | site_id | plain |
btree, for table "child.eric_umts_rnc_utrancell_hsdsch_eul_201910"
That's an index on a table partition, but not itself a child of a relkind=I
index.
Unfortunately, there was no core file, and I'm still trying to reproduce it.
I can't see that the table was INSERTed into during the reindex...
But looks like it was SELECTed from, and the report finished within 1sec of the
crash:
(2019-10-10 22:30:50,485 - p1604 t140325365622592 - INFO): PID 1604 finished
running report; est=None rows=552; cols=83; [...] duration:12
postgres=# SELECT log_time, pid, session_id, left(message,99), detail FROM
postgres_log_2019_10_10_2200 WHERE pid=29857 OR (log_time BETWEEN '2019-10-10
22:31:02.18' AND '2019-10-10 22:31:02.4' AND NOT message~'crash of another')
ORDER BY log_time LIMIT 9;
2019-10-10 22:30:24.441-04 | 29857 | 5d9fe93f.74a1 | temporary file: path
"base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/0.0", size 3096576 |
2019-10-10 22:30:24.442-04 | 29857 | 5d9fe93f.74a1 | temporary file: path
"base/pgsql_tmp/pgsql_tmp29857.0.sharedfileset/1.0", size 2809856 |
2019-10-10 22:30:24.907-04 | 29857 | 5d9fe93f.74a1 | process 29857 still
waiting for ShareLock on virtual transaction 30/103010 after 333.078 ms |
Process holding the lock: 29671. Wait queue: 29857.
2019-10-10 22:31:02.186-04 | 29857 | 5d9fe93f.74a1 | process 29857 acquired
ShareLock on virtual transaction 30/103010 after 37611.995 ms |
2019-10-10 22:31:02.186-04 | 29671 | 5d9fe92a.73e7 | duration: 50044.778 ms
statement: SELECT fn, sz FROM +|
| | |
(SELECT file_name fn, file_size_bytes sz, +|
| | |
|
2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | terminating any other
active server processes |
2019-10-10 22:31:02.399-04 | 1161 | 5d9cad9e.489 | server process (PID
29857) was terminated by signal 11: Segmentation fault |
Failed process was running: REINDEX INDEX CONCURRENTLY
child.eric_umts_rnc_utrancell_hsdsch_eul_201910_site_idx
Justin