Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
On 29 August 2014 01:04, Thomas Munro wrote: > On 28 August 2014 23:45, Tom Lane wrote: >> I don't claim to be an expert on this stuff, but I had the idea that >> multithreaded environments were supposed to track signal state per-thread >> not just per-process, precisely because of issues like this. > > After some googling, I found reply #3 in > https://community.oracle.com/thread/1950900?start=0&tstart=0 and > various other sources which say that in Solaris versions 10 they > changed SIGPIPE delivery from per process (as specified by UNIX98) to > per thread (as specified by POSIX:2001). But we are on version 11, so > my theory doesn't look great. (Though 9 is probably still in use out > there somewhere...) I discovered that the machine we saw the problem on was running Solaris 9 at the time, but has been upgraded since. So I think my sigwait race theory may have been correct, but we can just put this down to a historical quirk and forget about it, because Solaris 9 is basically deceased ("extended support" ends in October 2014). Sorry for the noise. Best regards, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
On 28 August 2014 23:45, Tom Lane wrote: > I don't claim to be an expert on this stuff, but I had the idea that > multithreaded environments were supposed to track signal state per-thread > not just per-process, precisely because of issues like this. After some googling, I found reply #3 in https://community.oracle.com/thread/1950900?start=0&tstart=0 and various other sources which say that in Solaris versions 10 they changed SIGPIPE delivery from per process (as specified by UNIX98) to per thread (as specified by POSIX:2001). But we are on version 11, so my theory doesn't look great. (Though 9 is probably still in use out there somewhere...) I also found this article: http://krokisplace.blogspot.co.uk/2010/02/suppressing-sigpipe-in-library.html The author recommends an approach nearly identical to the PostgreSQL approach, except s/he says: "to do this we use sigtimedwait() with zero timeout; this is to avoid blocking in a scenario where malicious user sent SIGPIPE manually to a whole process: in this case we will see it pending, but other thread may handle it before we had a [chance] to wait for it". Maybe we have malicious users sending signals to processes. It does seem more likely the crashing database triggered this somehow though, perhaps in combination with something else the client app was doing, though I can't think what it could be that would eat another thread's SIGPIPE in between the sigpending and sigwait syscalls. Best regards, Thomas Munro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Multithreaded SIGPIPE race in libpq on Solaris
Thomas Munro writes: > My theory is that if two connections accessed by different threads get > shut down around the same time, there is a race scenario where each of > them fails to write to its socket, sees errno == EPIPE and then sees a > pending SIGPIPE with sigpending(), but only one thread returns from > sigwait() due to signal merging. Hm, that does sound like it could be a problem, if the platform fails to track pending SIGPIPE on a per-thread basis. > We never saw the problem again after we made the following change: > ... > Does this make any sense? I don't think that patch would fix the problem if it's real. It would prevent libpq from hanging up when it's trying to throw away a pending SIGPIPE, but the fundamental issue is that that action could cause a SIGPIPE that's meant for some other thread to get lost; and that other thread isn't necessarily doing anything with libpq. I don't claim to be an expert on this stuff, but I had the idea that multithreaded environments were supposed to track signal state per-thread not just per-process, precisely because of issues like this. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers