Nick Lewycky wrote: > Hi. I've been working to add prefetching to squid3. It works by > analyzing HTML and looking for various tags that a graphical browser an > be expected to request. > > So far, it seems to just-barely work. What works is checking the > content-type of the document, avoiding encoded (gzip'ed) documents, > analyzing the HTML using libxml2 in "tag soup" mode, resolving the full > URL from relative references, and fetching the files into the cache. (I > would, of course, appreciate code reviews of the branch before I diverge > too far!) > > However, I've run into a few problems. > > To prefetch a page, we call clientBeginRequest. I've already had to > extend the richness of this interface a little. The main problem is that > it will open up a new socket for each call. On a page with 100 > prefetchables, it will open 100 TCP connections to the remote server. > That's not nice. I need a way to re-use a connection for multiple > requests. How should I do this? I'd like clientBeginRequest to be smart > enough to handle this behind the scenes. > > Occasionally I see duplicate prefetches. I think what's going on here is > that the object is uncacheable. The only way I can think of solving this > is by adding an "uncacheable" entry type to the store -- but that just > seems wrong, conceptually. On a related note, maybe we could terminate a > prefetch as soon as we receive the headers and notice that it's > uncacheable. Currently, we download the whole thing and just discard it > (after analyzing it for more prefetchables if it's HTML). > > Finally, does anyone have suggestions for how to test for performance > improvement due to prefetching?
A good way to test how your algorithms are working is to get a nice, long actual Squid workload -eg, URLs fetched, and compare how long it takes to execute the whole thing with and without prefetching. Note that you generally have to prefetch a LOT of stuff to get much improvement, because web cache fetch popularity follows zipf's law and decays slowly. Good luck with your work. Jon
