On Tue, Jan 3, 2017 at 1:05 PM Peter Eisentraut <peter.eisentr...@2ndquadrant.com> wrote: > > On 11/7/16 5:31 PM, Merlin Moncure wrote: > > Regardless, it seems like you might be on to something, and I'm > > inclined to patch your change, test it, and roll it out to production. > > If it helps or at least narrows the problem down, we ought to give it > > consideration for inclusion (unless someone else can think of a good > > reason not to do that, heh!). > > Any results yet?
Not yet. But I do have some interesting findings. At this point I do not think the problem is within pl/sh itself, but that when a process is invoked from pl/sh misbehaves that misbehavior can penetrate into the database processes. I also believe that this problem is fd related, so that the 'close on exec' might reasonably fix it. All cases of database damage I have observed remain completely mitigated by enabling database checksums. Recently, a sqsh process kicked off via pl/sh crashed with signal 11 but the database process was otherwise intact and fine. This is strong supporting evidence to my points above, I think. I've also turned up a fairly reliable reproduction case from some unrelated application changes. If I can demonstrate that close on exec flag works and prevents these occurrences we can close the book on this. merlin