On 07/14/10 18:36, Andrej Podzimek wrote:
I think this could be a memory-related issue, but I'm not sure about it.
I reduced the number of callbacks 1000 times and rerun the benchmark.
Based on an estimate of 100 callbacks per second, it should have taken
about 80 seconds. But this time the pathological case did not happen.
The benchmark completed in a fraction of a second.

But there's still something wrong with this theory: When I looked at the
memory statistics during the benchmark with the original high number of
callbacks, there was *no* evidence of memory pressure:

Is the pointer given to each callback:

static void
callback(uint32_t *counter) {
atomic_dec_32(counter);
}

on its own cache line or do you have everyone hammering
the same cache line?

There are currently 8 32-bit counters in an array, so they can all fit into one cache line.
I believe this is what Martin was referring to :
Look at how arrays of mutexes are defined in Solaris - they are explicitly padded to avoid cache line contention when 2cpus are working on adjacent mutexes simultaneously. In your case, I would expect lot more contention as 8cpus are contending for the same
line.
Modify the array to be an array of structures with an union element, first one of which is the counter and 2nd a char[] of cacheline size - and you should see some improvement
whether you access it from kernel or user land program.
-Surya
_______________________________________________
on-discuss mailing list
on-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/on-discuss

Reply via email to