Tom Lane wrote:

Manfred Spraul <[EMAIL PROTECTED]> writes:

Tom Lane wrote:

The bigger problem here is that the SMP locking bottlenecks we are
currently seeing are *hardware* issues (AFAICT anyway).  The only way
that futexes can offer a performance win is if they have a smarter way
of executing the basic atomic-test-and-set sequence than we do;

lwlocks operations are not a basic atomic-test-and-set sequence. They are spinlock, several nonatomic operations, spin_unlock.

Right, and it is the spinlock that is the problem. See discussions a few months back: at least on Intel SMP machines, most of the problem seems to have to do with trading the spinlock's cache line back and forth between CPUs.

I'd disagree: cache line bouncing is one problem. If this happens then there is only one solution: The number of changes to that cacheline must be reduced. The tools that are used in the linux kernel are:
- hashing. An emergency approach if there is no other solution. I think RedHat used it for the buffer cache RH AS: Instead of one buffer cache, there were lots of smaller buffer caches with individual locks. The cache was chosen based on the file position (probably mixed with some pointers to avoid overloading cache 0).
- For read-heavy loads: sequence locks. A reader reads a counter value and then accesses the data structure. At the end it checks if the counter was modified. If it's still the same value then it can continue, otherwise it must retry. Writers acquire a normal spinlock and then modify the counter value. RCU is the second option, but there are patents - please be careful before using that tool.
- complete rewrites that avoid the global lock. I think the global buffer cache is now gone, everything is handled per-file. I think there is a global list for buffer replacement, but the at the top of the buffer replacement strategy is a simple clock algorithm. That means that simple lookups/accesses just set a (local) referenced bit and don't have to acquire a global lock. I know that this is the total opposite of ARC, but perhaps it's the only scalable solution. ARC could be used as the second level strategy.

But: According to the descriptions the problem is a context switch storm. I don't see that cache line bouncing can cause a context switch storm. What causes the context switch storm? If it's the pg_usleep in s_lock, then my patch should help a lot: with pthread_rwlock locks, this line doesn't exist anymore.


---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

Reply via email to