On Mon, Mar 17, 2008, Robert Collins wrote: > This reminds me, one of the things I was thinking heavily about a few > years ago was locality of reference in N-CPU situations. That is, making > sure we don't cause thrashing unnecessarily. For instance - given > chunking we can't really avoid seeing all the bytes for a MISS, so does > it matter if process all the request on one CPU, or part on one part on > another? Given NUMA it clearly does matter, but how many folk run > squid/want to run squid on a NUMA machines?
Everything you buy now is effectively NUMA-like. > Or, should we make acl lookups come back to the same cpu, but do all the > acl lookups on one cpu, trading potential locking (a non-read-blocking > cache can allow result lookups cheaply) for running the same acl code > over extended sets of acls. (Not quite SIMD, but think about the problem > from that angle for a bit). Thats what I'd like to benchmark. More importantly, what will work out better? Run-to-completion like -2 and -3 are now, or run-parts-of-jobs-as-batch-processes? (eg, run all pending ACL lookups -now- and queue whatever results can be; run all network IO -now- and queue whatever results can be; run header parsing -now- and queue whatever results can be.) Sufficiently well written code can be run both ways - its just a question of queuing and scheduling. Adrian -- - Xenion - http://www.xenion.com.au/ - VPS Hosting - Commercial Squid Support - - $25/pm entry-level VPSes w/ capped bandwidth charges available in WA -
