Hello, I would like to implement the following changes to add initial SMP support to Squid. The changes are relatively simple but should allow us to reap a lot of SMP benefits in many real deployments. I hope to submit the results for review and trunk inclusion in about 45 days. More sophisticated changes will follow, guided by real and lab performance data.
The plan is based on the SMP-related discussions we had in the past few months. I factored in the apparent lack of developers available to tackle more ambitious designs, the current state of code, and our historical inability to handle huge projects without slipping off schedule and creating various disasters along the way. 1. Processes versus threads: The initial design is process- and not thread-based. Processes may be converted to threads and/or threads may be added to certain processes later. If we start with threads now, we may drown in problems caused by poor encapsulation of some of the major code pieces and thread-unreadiness of basic Squid libraries. 2. Building blocks: A Squid process dedicated to a subset of squid.conf options is a building block. The user can configure Squid to launch multiple such processes. Option subsets are likely to overlap a lot because many options would be the same for each process. In the initial implementation, _all_ subsets will be identical. In other words, all Squid processes will be identically configured and, hence, "do the same thing". The main Squid process will need to open http_port(s) and other listening sockets. It will either do it before forking child Squids or will pass open socket descriptors to child Squids via sendmsg(2). In the initial implementation, the admin will be able to specify which CPU core(s) should be used for Squid. Eventually, it would be possible to map individual building blocks to individual CPU cores. 3. What is expected to work: SMP-scalable performance on general workloads. For example, if you have 8 CPU cores, you can utilize all of them. Squid will behave as a single instance with respect to misses, reconfiguration, access logging, and mgr:info part of the cache manager interface. 4. Limitations: Caching in the initial implementation is not shared and not synchronized. Options that require exclusive, single-process access such as a single source port for HTCP queries will not be supported in SMP mode. Eventually, the associated functionality can be adjusted to work with multiple processes or threads. When logging via a pipe to a program, multiple program instances would be launched. I am sure other limitations will surface. 5. Next steps: After the initial design is implemented, we will add support for shared or at least synchronized cache. Also, more cache manager pages will aggregate information from all processes. Please review. Thank you, Alex.