On 07/14/10 18:36, Andrej Podzimek wrote:
I think this could be a memory-related issue, but I'm not sure about
it.
I reduced the number of callbacks 1000 times and rerun the benchmark.
Based on an estimate of 100 callbacks per second, it should have taken
about 80 seconds. But this time the pathological case did not happen.
The benchmark completed in a fraction of a second.
But there's still something wrong with this theory: When I looked at
the
memory statistics during the benchmark with the original high number of
callbacks, there was *no* evidence of memory pressure:
Is the pointer given to each callback:
static void
callback(uint32_t *counter) {
atomic_dec_32(counter);
}
on its own cache line or do you have everyone hammering
the same cache line?
There are currently 8 32-bit counters in an array, so they can all fit
into one cache line.
I believe this is what Martin was referring to :
Look at how arrays of mutexes are defined in Solaris - they are
explicitly padded to
avoid cache line contention when 2cpus are working on adjacent mutexes
simultaneously.
In your case, I would expect lot more contention as 8cpus are contending
for the same
line.
Modify the array to be an array of structures with an union element,
first one of which
is the counter and 2nd a char[] of cacheline size - and you should see
some improvement
whether you access it from kernel or user land program.
-Surya
_______________________________________________
on-discuss mailing list
on-discuss@opensolaris.org
http://mail.opensolaris.org/mailman/listinfo/on-discuss