On Mon, Nov 22, 2010 at 6:54 AM, Heikki Linnakangas <heikki.linnakan...@enterprisedb.com> wrote: > On 21.11.2010 15:18, Robert Haas wrote: >> >> On Sat, Nov 20, 2010 at 4:07 PM, Tom Lane<t...@sss.pgh.pa.us> wrote: >>> >>> Robert Haas<robertmh...@gmail.com> writes: >>>> >>>> So what DO we need to guard against here? >>> >>> I think the general problem can be stated as "process A changes two or >>> more values in shared memory in a fairly short span of time, and process >>> B, which is concurrently examining the same variables, sees those >>> changes occur in a different order than A thought it made them in". >>> >>> In practice we do not need to worry about changes made with a kernel >>> call in between, as any sort of context swap will cause the kernel to >>> force cache synchronization. >>> >>> Also, the intention is that the locking primitives will take care of >>> this for any shared structures that are protected by a lock. (There >>> were some comments upthread suggesting maybe our lock code is not >>> bulletproof; but if so that's something to fix in the lock code, not >>> a logic error in code using the locks.) >>> >>> So what this boils down to is being an issue for shared data structures >>> that we access without using locks. As, for example, the latch >>> structures. >> >> So is the problem case a race involving owning/disowning a latch vs. >> setting that same latch? > > No. (or maybe that as well, but that's not what we've been concerned about > here). As far as I've understood correctly, the problem is that process A > does something like this: > > /* set a shared variable */ > ((volatile bool *) shmem)->variable = true; > /* Wake up process B to notice that we changed the variable */ > SetLatch(); > > And process B does this: > > for (;;) > { > ResetLatch(); > if (((volatile bool *) shmem)->variable) > DoStuff(); > > WaitLatch(); > } > > This is the documented usage pattern of latches. The problem arises if > process A runs just before ResetLatch, but the effect of setting the shared > variable doesn't become visible until after the if-test in process B. > Process B will clear the is_set flag in ResetLatch(), but it will not > DoStuff(), so it in effect misses the wakeup from process A and goes back to > sleep even though it would have work to do. > > This situation doesn't arise in the current use of latches, because the > shared state comparable to shmem->variable in the above example is protected > by a spinlock. But it might become an issue in some future use case.
Eh, so, should we do anything about this? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers