On 12/5/14, 4:11 PM, Peter Geoghegan wrote:
On Fri, Dec 5, 2014 at 1:29 PM, Josh Berkus <j...@agliodbs.com> wrote:
We made some changes which decreased query cancel (optimizing queries,
turning on hot_standby_feedback) and we haven't seen a segfault since
then.  As far as the user is concerned, this solves the problem, so I'm
never going to get a trace or a core dump file.

Forgot a major piece of evidence as to why I think this is related to
query cancel:  in each case, the segfault was preceeded by a
multi-backend query cancel 3ms to 30ms beforehand.  It is possible that
the backend running the query which segfaulted might have been the only
backend *not* cancelled due to query conflict concurrently.
Contradicting this, there are other multi-backend query cancels in the
logs which do NOT produce a segfault.

I wonder if it would be useful to add additional instrumentation so
that even without a core dump, there was some cursory information
about the nature of a segfault.

Yes, doing something with a SIGSEGV handler is very scary, and there
are major portability concerns (e.g.
https://bugs.ruby-lang.org/issues/9654), but I believe it can be made
robust on Linux. For what it's worth, this open source project offers
that kind of functionality in the form of a library:
https://github.com/vmarkovtsev/DeathHandler

Perhaps we should also officially recommend production servers be setup to 
create core files. AFAIK the only downside is the time it would take to write a 
core that's huge because of shared buffers, but perhaps there's some way to 
avoid writing those? (That means the core won't help if the bug is due to 
something in a buffer, but that seems unlikely enough that the tradeoff is 
worth it...)
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to