On Friday 15 September 2006 16:41, George Woltman wrote: > At 03:00 AM 9/15/2006, you wrote: > >two CPU cores sharing a common L2 cache don't have to send 'cache snoop' > >cycles and other coherency overhead over the bus, in a seperate > >cache-per-CPU system, these snoop cycles can use quite a lot of time > >when you have frequent cache misses (and even worse, when both CPUs are > >frequently writing to the same location for interprocess communications > >or whatever, that row has to be repeatedly flushed from each cache). > > I have assumed that each thread sharing read-only sin/cos data > incurs no penalty. Is this correct? Does cache-snoop penalties only > happen when writing data?
Doesn't this depend on how the data segments are tagged? If the cached memory is from segments tagged read-only then there should be no overhead. If the cache contents are from a writable segment then an overhead exists - either check on each access that the cache contents are still valid (the same address hasn't been written to by another core), or (more probably) the overhead of marking as invalid cache lines belonging to other cores on each data write. This should be doable in hardware when cores and caches are sharing the same die though there would still probably be a loss of a clock cycle every time this is required. Probably the best way to organise memory in a LL test is to mark the input workspace vector read-only and the output workspace vector writeable. Then flush the caches & swap the tags between iterations. Regards Brian Beesley _______________________________________________ Prime mailing list [email protected] http://hogranch.com/mailman/listinfo/prime
