After adding extended/MV stats to a few of our tables a few days ago, it looks
like I wasn't been paying attention and this first crashed 2 nights ago.  Why
at 1am?  not sure.  I have an "reindex" job which runs at 1am, and an
vacuum/analyze job which runs at 2am, but I don't use cron to change
autovac/analyze thresholds..

Nov 16 01:15:42 database kernel: postmaster[16144]: segfault at 0 ip 
00000000006f3e3e sp 00007ffe2c8fc650 error 4 in postgres[400000+693000]
[...]
Nov 17 01:15:41 database kernel: postmaster[7145]: segfault at 0 ip 
00000000006f3e3e sp 00007ffe2c8fc650 error 4 in postgres[400000+693000]

[pryzbyj@database ~]$ ps -fu postgres
UID        PID  PPID  C STIME TTY          TIME CMD
postgres  1757     1  0 Nov09 ?        00:20:43 /usr/pgsql-10/bin/postmaster -D 
/var/lib/pgsql/10/data
=> no longer running Alvaro's original patch on this server, which was also
first to experience the BRIN crash..

< 2017-11-16 01:15:43.103 -04  >LOG:  server process (PID 16144) was terminated 
by signal 11: Segmentation fault
< 2017-11-16 01:15:43.103 -04  >DETAIL:  Failed process was running: 
autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711

< 2017-11-17 01:15:41.673 -04  >LOG:  server process (PID 7145) was terminated 
by signal 11: Segmentation fault
< 2017-11-17 01:15:41.673 -04  >DETAIL:  Failed process was running: 
autovacuum: ANALYZE public.daily_enodeb_baseband_view_201711

Here's a log of my reindex job for the last two nights' crashes.  It's
suspicious that the baseband table was reindexed ~15min before autovac crashed
during its processing, but it's also unclear to me why it would matter.

[...]
|Thu Nov 16 01:02:54 -04 2017: daily_enodeb_baseband_view_201711: 
daily_enodeb_baseband_view_201711_unique_idx(repack current)...
|Thu Nov 16 01:02:59 -04 2017: daily_enodeb_baseband_view_201711: 
pg_toast.pg_toast_691157026_index(reindex system)...
...
|Thu Nov 16 01:15:22 -04 2017: eric_hss_l2if_metrics_cum: 
eric_hss_l2if_metrics_cum_start_time_idx(repack non-partitioned)...
|Thu Nov 16 01:15:27 -04 2017: eric_hss_platform_metrics: 
eric_hss_platform_metrics_start_time_idx(repack non-partitioned)...
|WARNING:  terminating connection because of crash of another server process
|DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
|HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
|WARNING: Error creating index "public"."index_40552734": server closed the 
connection unexpectedly
|        This probably means the server terminated abnormally
|        before or while processing the request.
|WARNING: Skipping index swapping for "eric_hss_platform_metrics", since no new 
indexes built
|WARNING: repack failed for "eric_hss_platform_metrics_start_time_idx"
|psql: FATAL:  the database system is in recovery mode
|Filesystem      Size  Used Avail Use% Mounted on
|/dev/vdb        2.6T  2.1T  459G  83% /var/lib/pgsql

[...]
|Fri Nov 17 01:01:44 -04 2017: daily_enodeb_baseband_view_201711: 
daily_enodeb_baseband_view_201711_unique_idx(repack current)...
|Fri Nov 17 01:01:54 -04 2017: daily_enodeb_baseband_view_201711: 
pg_toast.pg_toast_691157026_index(reindex system)...
...
|Fri Nov 17 01:14:58 -04 2017: link_busy_hr: lbh_start_time_idx(repack 
non-partitioned)...
|Fri Nov 17 01:14:58 -04 2017: loaded_cdr_files: 
loaded_cdr_files_filename(repack non-partitioned)...
|WARNING:  terminating connection because of crash of another server process
|DETAIL:  The postmaster has commanded this server process to roll back the 
current transaction and exit, because another server process exited abnormally 
and possibly corrupted shared memory.
|HINT:  In a moment you should be able to reconnect to the database and repeat 
your command.
|WARNING: Error creating index "public"."index_30971": server closed the 
connection unexpectedly
|        This probably means the server terminated abnormally
|        before or while processing the request.
|WARNING: Skipping index swapping for "loaded_cdr_files", since no new indexes 
built
|WARNING: repack failed for "loaded_cdr_files_filename"
|psql: FATAL:  the database system is in recovery mode

Core was generated by `postgres: autovacuum worker process   gtt             '.
Program terminated with signal 11, Segmentation fault.
#0  statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, 
attrs=<value optimized out>, stats=0x0) at mvdistinct.c:103
103                                     item->attrs = 
bms_add_member(item->attrs,

(gdb) p stats
$5 = (VacAttrStats **) 0x0
=> This is an error, no ??

(gdb) p item
$1 = (MVNDistinctItem *) 0x2090928
(gdb) p result
$6 = (MVNDistinct *) 0x2090918

(gdb) p item->attrs
$2 = (Bitmapset *) 0x0
(gdb) p rows
$3 = (HeapTuple *) 0x21df3e8
(gdb) p j
$8 = <value optimized out>
(gdb) p attrs
$4 = <value optimized out>

Let me know if there's anything more I should send.

Justin

(gdb) bt f
#0  statext_ndistinct_build (totalrows=300, numrows=300, rows=0x21df3e8, 
attrs=<value optimized out>, stats=0x0) at mvdistinct.c:103
        item = 0x2090928
        j = <value optimized out>
        combination = 0x20908c8
        generator = 0x2090d08
        result = 0x2090918
        k = 2
        itemcnt = 0
        numattrs = 4
        numcombs = <value optimized out>

#1  0x00000000006f1a0e in BuildRelationExtStatistics (onerel=0x7f7895485fa0, 
totalrows=300, numrows=300, rows=0x21df3e8, natts=34147512, vacattrstats=0x0) 
at extended_stats.c:120
        t = <value optimized out>
        stat = 0x2090b48
        stats = <value optimized out>
        lc2 = 0x2090c48
        ndistinct = <value optimized out>
        dependencies = <value optimized out>
        pg_stext = 0x7f7895495478
        lc = 0x2090cb8
        stats = 0x0
        cxt = 0x2054cd8
        oldcxt = 0x1fd0f58
        __func__ = "BuildRelationExtStatistics"

#2  0x000000000057aad6 in do_analyze_rel (onerel=0x7f7895485fa0, options=98, 
params=0x2094e90, va_cols=0x0, acquirefunc=0x57bb20 <acquire_sample_rows>, 
relpages=80, inh=0 '\000', in_outer_xact=0 '\000',
    elevel=13) at analyze.c:593
        col_context = 0x2054cd8
        old_context = 0x1fd0f58
        attr_cnt = 24
        tcnt = <value optimized out>
        i = <value optimized out>
        ind = <value optimized out>
        Irel = 0x2074838
        nindexes = 1
        hasindex = <value optimized out>
        vacattrstats = 0x2095868
        indexdata = 0x2097810
        targrows = <value optimized out>
        numrows = 300
        totalrows = 300
        totaldeadrows = 20
        rows = 0x21df3e8
        ru0 = {tv = {tv_sec = 1510809341, tv_usec = 958864}, ru = {ru_utime = 
{tv_sec = 0, tv_usec = 436933}, ru_stime = {tv_sec = 0, tv_usec = 18997}, 
ru_maxrss = 14988, ru_ixrss = 0, ru_idrss = 0,
            ru_isrss = 0, ru_minflt = 5090, ru_majflt = 0, ru_nswap = 0, 
ru_inblock = 2560, ru_oublock = 240, ru_msgsnd = 0, ru_msgrcv = 0, ru_nsignals 
= 0, ru_nvcsw = 163, ru_nivcsw = 26}}
        starttime = 0
        caller_context = 0x2055338
        save_userid = <value optimized out>
        save_sec_context = 0
        save_nestlevel = 2
        __func__ = "do_analyze_rel"

#3  0x000000000057b7a4 in analyze_rel (relid=<value optimized out>, 
relation=0x7ffe2c8fcfc0, options=98, params=0x2094e90, va_cols=0x0, 
in_outer_xact=0 '\000', bstrategy=0x21c1358) at analyze.c:276
        onerel = 0x7f7895485fa0
        elevel = 13
        acquirefunc = 0x57bb20 <acquire_sample_rows>
        relpages = 80
        __func__ = "analyze_rel"

#4  0x00000000005e6edf in vacuum (options=98, relation=0x7ffe2c8fcfc0, 
relid=<value optimized out>, params=0x2094e90, va_cols=0x0, bstrategy=<value 
optimized out>, isTopLevel=1 '\001') at vacuum.c:323
        relid = 691157026
        cur = 0x208a8e8
        save_exception_stack = 0x7ffe2c8fce30
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {33904488, 2571701076356583921, 
9927477, 4294967295, 200, 20, -2572437610069143055, 2571700781410898417}, 
__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 0, 0,
                34861944, 0, 140155877187248, 140160420441688, 2, 
140729646043152, 257015007210, 140160420441688, 8282462, 1510809341, 958742, 
140160420553984}}}}
        stmttype = <value optimized out>
        in_outer_xact = 0 '\000'
        use_own_xacts = 1 '\001'
        relations = 0x208a908
        in_vacuum = 1 '\001'
        __func__ = "vacuum"

#5  0x00000000006a7d22 in autovacuum_do_vac_analyze () at autovacuum.c:3096
        rangevar = {type = T_Invalid, catalogname = 0x0, schemaname = 0x213f378 
"public", relname = 0x21c18b8 "daily_enodeb_baseband_view_201711", inh = 0 
'\000', relpersistence = 0 '\000', alias = 0x0, 
          location = -1}
#6  do_autovacuum () at autovacuum.c:2451
        save_exception_stack = 0x7ffe2c8fd110
        save_context_stack = 0x0
        local_sigjmp_buf = {{__jmpbuf = {691157026, 2571696918420345329, 
34163832, 4294967295, 200, 20, -2572437610046074383, 2571700883870929393}, 
__mask_was_saved = 0, __saved_mask = {__val = {0, 0, 
                140729646043392, 15874306463571202545, 2571702211223048689, 0, 
6998306, 140160420441688, 747622240, 13239856, 13239680, 0, 6444974, 13241760, 
0, 0}}}}
        tab = 0x2094e88
        skipit = 0 '\000'
        stdVacuumCostDelay = 0
        stdVacuumCostLimit = 200
        iter = <value optimized out>
        relid = 691157026
        classRel = 0x7f7895453e58
        tuple = <value optimized out>
        relScan = <value optimized out>
        dbForm = <value optimized out>
        table_oids = <value optimized out>
        orphan_oids = 0x0
        ctl = {num_partitions = 0, ssize = 0, dsize = 0, max_dsize = 0, ffactor 
= 0, keysize = 4, entrysize = 80, hash = 0, match = 0, keycopy = 0, alloc = 0, 
hcxt = 0x0, hctl = 0x0}
        table_toast_map = 0x20a5338
        cell = 0x2095020
        shared = 0x206a208
        dbentry = 0x206a490
        bstrategy = 0x21c1358
        key = {sk_flags = 0, sk_attno = 16, sk_strategy = 3, sk_subtype = 0, 
sk_collation = 100, sk_func = {fn_addr = 0x750cd0 <chareq>, fn_oid = 61, 
fn_nargs = 2, fn_strict = 1 '\001', 
            fn_retset = 0 '\000', fn_stats = 2 '\002', fn_extra = 0x0, fn_mcxt 
= 0x1fd0378, fn_expr = 0x0}, sk_argument = 116}
        pg_class_desc = 0x2093858
        effective_multixact_freeze_max_age = 400000000
        did_vacuum = <value optimized out>
        found_concurrent_worker = 0 '\000'
        i = <value optimized out>
        __func__ = "do_autovacuum"
#7  0x00000000006a84d8 in AutoVacWorkerMain (argc=<value optimized out>, 
argv=<value optimized out>) at autovacuum.c:1694
        dbname = 
"gtt\000\000\000\000\000\325I\001\327;\000\000\000\020?\000\000\000\000\000\000E\352\200\327;\000\000\000\001\000\000\000\000\000\000\000\002\000\000\000\000\000\000\000\374$\316",
 '\000' <repeats 12 times>
        local_sigjmp_buf = {{__jmpbuf = {0, 2571701076356059633, 33315888, 
1510809326, 16143, 1510808581, -2572437610981404175, 2571700891614532081}, 
__mask_was_saved = 1, __saved_mask = {__val = {
                18446744066192964103, 13188120, 1, 13186760, 0, 16143, 
1510808581, 0, 257010229664, 1, 0, 140729646043600, 257014425560, 4, 0, 4}}}}
        dbid = 16400
        __func__ = "AutoVacWorkerMain"
#8  0x00000000006a85a6 in StartAutoVacWorker () at autovacuum.c:1499
        worker_pid = 0
        __func__ = "StartAutoVacWorker"
#9  0x00000000006b606a in StartAutovacuumWorker (postgres_signal_arg=<value 
optimized out>) at postmaster.c:5414
        bn = 0x1fc5c30
#10 sigusr1_handler (postgres_signal_arg=<value optimized out>) at 
postmaster.c:5111
        save_errno = 4
        __func__ = "sigusr1_handler"

Reply via email to