On 08/14/2018 01:49 PM, Tomas Vondra wrote:
On 08/13/2018 04:49 PM, Andres Freund wrote:
Hi,

On 2018-08-13 11:46:30 -0300, Alvaro Herrera wrote:
On 2018-Aug-11, Tomas Vondra wrote:

Hmmm, it's difficult to compare "bt full" output, but my backtraces look
somewhat different (and all the backtraces I'm seeing are 100% exactly
the same). Attached for comparison.

Hmm, looks similar enough to me -- at the bottom you have the executor
doing its thing, then an AcceptInvalidationMessages in the middle
section atop which sit a few more catalog accesses, and further up from
that you have another AcceptInvalidationMessages with more catalog
accesses.  AFAICS that's pretty much the same thing Andres was
describing.

It's somewhat different because it doesn't seem to involve a reload of a
nailed table, which my traces did.  I wasn't able to reproduce the crash
more than once, so I'm not at all sure how to properly verify the issue.
I'd appreciate if Thomas could try to do so again with the small patch I
provided.


I'll try in the evening. I've tried reproducing it on my laptop, but I can't make that happen for some reason - I know I've seen some crashes here, but all the reproducers were from the workstation I have at home.

I wonder if there's some subtle difference between the two boxes, making it more likely on one of them ... the whole environment (distribution, packages, compiler, ...) should be exactly the same, though. The only thing I can think of is different CPU speed, possibly making some race conditions more/less likely. No idea.


I take that back - I can reproduce the crashes, both with and without the patch, all the way back to 9.6. Attached is a bunch of backtraces from various versions. There's a bit of variability depending on which pgbench script gets started first (insert.sql or vacuum.sql) - in one case (when vacuum is started before insert) the crash happens in InitPostgres/RelationCacheInitializePhase3, otherwise it happens in exec_simple_query.

Another observation is that the failing COPY is not necessary, I can reproduce the crashes without this (so even with wal_level=replica).

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Attachment: crash-10.log.gz
Description: application/gzip

Attachment: crash-11.log.gz
Description: application/gzip

Attachment: crash-11-2.log.gz
Description: application/gzip

Attachment: crash-11-3.log.gz
Description: application/gzip

Attachment: crash-96.log.gz
Description: application/gzip

Attachment: crash-96-2.log.gz
Description: application/gzip

Attachment: crash-96-logical.log.gz
Description: application/gzip

Reply via email to