Alvaro Herrera wrote:
> Tom Lane wrote:
>> Stefan Kaltenbrunner <[EMAIL PROTECTED]> writes:
>>> Stefan Kaltenbrunner wrote:
>>>> two of my buildfarm members had different but pretty weird looking
>>>> failures lately:
>>>> and
>>>> any ideas on what might causing those ?
> Just for the record, quagga and emu failures don't seem related to the
> report below.  They don't crash; the regression.diffs contains data that
> suggests that there may be data corruption of some sort.
> INSERT INTO INET_TBL (c, i) VALUES ('', '');
> ERROR:  invalid cidr value: "%{"
> This doesn't seem to make much sense.

yeah on further reflection it looks like the failures from emu and
quagga seem unrelated to the issue lionfish is experiencing

>>> lionfish just failed too:
>> And had a similar failure a few days ago.  The curious thing is that
>> what we get in the postmaster log is
>> LOG:  server process (PID 23405) was terminated by signal 6: Aborted
>> LOG:  terminating any other active server processes
>> You would think SIGABRT would come from an assertion failure, but
>> there's no preceding assertion message in the log.  The other
>> characteristic of these crashes is that *all* of the failing regression
>> instances report "terminating connection because of crash of another
>> server process", which suggests strongly that the crash was in an
>> autovacuum process (if it were bgwriter or stats collector the
>> postmaster would've said so).  So I think the recent autovac patches
>> are at fault.  I spent a bit of time trolling for a spot where the code
>> might abort() without having printed anything, but didn't find one.
> Hmm.  I kept an eye on the buildfarm for a few days, but saw nothing
> that could be connected to autovacuum so I neglected it.
> This is the other failure:
> It shows the same pattern.  I am baffled -- I don't understand how it
> can die without reporting the error.

I should have mentioned that initially - but I think the failure from
2007-04-20 is not related at all.
The failure from 2007-04-20 was very likely caused due to the kernel
running totally out of memory (lionfish is a very resource starved box
at only 48MB of RAM and 128MB of swap at that time - do we have a recent
patch that is increasing memory usage quite a lot?).
I immediatly added another 128MB of swap after that and I don't think
the failure from yesterday is the same (at least there are no kernel
logs that indicate a similiar issue)
> Apparently it crashes rather frequently, so it shouldn't be too
> difficult to reproduce on manual runs.  If we could get it to run with a
> higher debug level, it might prove helpful to further pinpoint the
> problem.

a manual run of the buildfarm script takes ~4,5 hours on lionfish ;-)

> The core file would be much better obviously (first and foremost to
> confirm that it's autovacuum that's crashing ... )

I will see what I can come up with ...


