Re: [Prime] Multi-threaded carry propagation

Brian Beesley Fri, 15 Sep 2006 14:21:20 -0700

On Friday 15 September 2006 16:41, George Woltman wrote:
> At 03:00 AM 9/15/2006, you wrote:
> >two CPU cores sharing a common L2 cache don't have to send 'cache snoop'
> >cycles and other coherency overhead over the bus, in a seperate
> >cache-per-CPU system, these snoop cycles can use quite a lot of time
> >when you have frequent cache misses (and even worse, when both CPUs are
> >frequently writing to the same location for interprocess communications
> >or whatever, that row has to be repeatedly flushed from each cache).
>
> I have assumed that each thread sharing read-only sin/cos data
> incurs no penalty.   Is this correct?   Does cache-snoop penalties only
> happen when writing data?


Doesn't this depend on how the data segments are tagged? If the cached memory 
is from segments tagged read-only then there should be no overhead. If the 
cache contents are from a writable segment then an overhead exists - either 
check on each access that the cache contents are still valid (the same 
address hasn't been written to by another core), or (more probably) the 
overhead of marking as invalid cache lines belonging to other cores on each 
data write. This should be doable in hardware when cores and caches are 
sharing the same die though there would still probably be a loss of a clock 
cycle every time this is required.

Probably the best way to organise memory in a LL test is to mark the input 
workspace vector read-only and the output workspace vector writeable. Then 
flush the caches & swap the tags between iterations.

Regards
Brian Beesley
_______________________________________________
Prime mailing list
[email protected]
http://hogranch.com/mailman/listinfo/prime

Re: [Prime] Multi-threaded carry propagation

Reply via email to