On Mon, Sep 14, 2020 at 8:48 AM Amit Kapila <amit.kapil...@gmail.com> wrote: > > On Mon, Sep 14, 2020 at 3:08 AM Tom Lane <t...@sss.pgh.pa.us> wrote: > > > > Amit Kapila <amit.kapil...@gmail.com> writes: > > > Pushed. > > > > Observe the following reports: > > > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=idiacanthus&dt=2020-09-13%2016%3A54%3A03 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=desmoxytes&dt=2020-09-10%2009%3A08%3A03 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=komodoensis&dt=2020-09-05%2020%3A22%3A02 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-04%2001%3A52%3A03 > > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=dragonet&dt=2020-09-03%2020%3A54%3A04 > > > > These are all on HEAD, and all within the last ten days, and I see > > nothing comparable in any branch before that. So it's hard to avoid > > the conclusion that somebody broke something about ten days ago. > > > > None of these animals provided gdb backtraces; but we do have a built-in > > trace from several, and they all look like pgoutput.so is trying to > > list_free() garbage, somewhere inside a relcache invalidation/rebuild > > scenario: > > > > Yeah, this is right, and here is some initial analysis. It seems to be > failing in below code: > rel_sync_cache_relation_cb(){ ...list_free(entry->streamed_txns);..} > > This list can have elements only in 'streaming' mode (need to enable > 'streaming' with Create Subscription command) whereas none of the > tests in 010_truncate.pl is using 'streaming', so this list should be > empty (NULL). The two different assertion failures shown in BF reports > in list_free code are as below: > Assert(list->length > 0); > Assert(list->length <= list->max_length); > > It seems to me that this list is not initialized properly when it is > not used or maybe that is true in some special circumstances because > we initialize it in get_rel_sync_entry(). I am not sure if CCI build > is impacting this in some way.
Even I have analyzed this but did not find any reason why the streamed_txns list should be anything other than NULL. The only thing is we are initializing the entry->streamed_txns to NULL and the list free is checking "if (list == NIL)" then return. However IMHO, that should not be an issue becase NIL is defined as (List*) NULL. I am doing further testing and investigation. -- Regards, Dilip Kumar EnterpriseDB: http://www.enterprisedb.com