Re: [HACKERS] Group clear xid can leak semaphore count
On Fri, Jan 6, 2017 at 1:13 AM, Robert Haas wrote: > On Sat, Dec 31, 2016 at 12:44 AM, Amit Kapila wrote: >> During the review of Group update Clog patch [1], Dilip noticed an >> issue with the patch where it can leak the semaphore count in one of >> the corner case. I have checked and found that similar issue exists >> for Group clear xid (ProcArrayGroupClearXid) as well. I think the >> reason why this problem is not noticed by anyone till now is that it >> can happen only in a rare scenario when the backend waiting for xid >> clear is woken up due to some unrelated signal. This problem didn't >> exist in the original commit >> (0e141c0fbb211bdd23783afa731e3eef95c9ad7a) of the patch, but later >> while fixing some issues in the committed patch, it got introduced in >> commit 4aec49899e5782247e134f94ce1c6ee926f88e1c. Patch to fix the >> issue is attached. > > Yeah, that looks like a bug. Thanks for the detailed analysis; > committed and back-patched to 9.6. > Thanks for committing the fix. I have updated the CF app entry for this patch. -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group clear xid can leak semaphore count
On Thu, Jan 5, 2017 at 4:48 PM, Tom Lane wrote: > Robert Haas writes: >> I think we have run into this kind of issue before. I wonder if >> there's any way to insert some kind of a guard - e.g. detect at >> backend startup time that the semaphore has a non-zero value and fix >> it, issuing a warning along the way... maybe something like: > > See the PGSemaphoreReset near the end of InitProcess. Oh ho. So the worst consequence of this is a backend-lifetime leak, not a cluster-lifetime leak. That's good, at least. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group clear xid can leak semaphore count
Robert Haas writes: > I think we have run into this kind of issue before. I wonder if > there's any way to insert some kind of a guard - e.g. detect at > backend startup time that the semaphore has a non-zero value and fix > it, issuing a warning along the way... maybe something like: See the PGSemaphoreReset near the end of InitProcess. regards, tom lane -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] Group clear xid can leak semaphore count
On Sat, Dec 31, 2016 at 12:44 AM, Amit Kapila wrote: > During the review of Group update Clog patch [1], Dilip noticed an > issue with the patch where it can leak the semaphore count in one of > the corner case. I have checked and found that similar issue exists > for Group clear xid (ProcArrayGroupClearXid) as well. I think the > reason why this problem is not noticed by anyone till now is that it > can happen only in a rare scenario when the backend waiting for xid > clear is woken up due to some unrelated signal. This problem didn't > exist in the original commit > (0e141c0fbb211bdd23783afa731e3eef95c9ad7a) of the patch, but later > while fixing some issues in the committed patch, it got introduced in > commit 4aec49899e5782247e134f94ce1c6ee926f88e1c. Patch to fix the > issue is attached. Yeah, that looks like a bug. Thanks for the detailed analysis; committed and back-patched to 9.6. I suppose in the worst case it's possible that we'd leak a semaphore count and then every future time we enter a PGSemaphoreLock using that PGPROC we have to eat up the leaked count (or counts) and then put it (or them) back after we really wait. That would suck. But I wasn't able to observe any leaks in a high-concurrency pgbench test on hydra, so it's either very unlikely or requires some additional condition to trigger the problem. I think we have run into this kind of issue before. I wonder if there's any way to insert some kind of a guard - e.g. detect at backend startup time that the semaphore has a non-zero value and fix it, issuing a warning along the way... maybe something like: while (sem_trywait(sem) == 0) ++bogus; if (bogus > 0) elog(WARNING, "uh oh"); I'm not sure if that would be prone to false positives though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] Group clear xid can leak semaphore count
During the review of Group update Clog patch [1], Dilip noticed an issue with the patch where it can leak the semaphore count in one of the corner case. I have checked and found that similar issue exists for Group clear xid (ProcArrayGroupClearXid) as well. I think the reason why this problem is not noticed by anyone till now is that it can happen only in a rare scenario when the backend waiting for xid clear is woken up due to some unrelated signal. This problem didn't exist in the original commit (0e141c0fbb211bdd23783afa731e3eef95c9ad7a) of the patch, but later while fixing some issues in the committed patch, it got introduced in commit 4aec49899e5782247e134f94ce1c6ee926f88e1c. Patch to fix the issue is attached. [1] - https://www.postgresql.org/message-id/CAA4eK1J%2B67edo_Wnrfx8oJ%2BrWM_BAr%2Bv6JqvQjKPdLOxR%3D0d5g%40mail.gmail.com -- With Regards, Amit Kapila. EnterpriseDB: http://www.enterprisedb.com fix_absorbed_wakeups_clear_xid_v1.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers