No I dont think that we are actually analyzing the HTML that is retrieved. What we are doing is accessing the access.log file and to create a prediction model, so that if the user most recent request history matches a path in the prediction model then we will try and predict the next page that will be requested. For example, if after analyzing the access.log file one of the paths in the prediction model is a,b,c,d and the users most recent request history consist of a, b, c, we will fetch d and store it in the cache expecting the user to request d next. So I dont think we would be able to give you a hand. Sorry. But if it is not too much trouble for you, would you be able to help us, by letting us know where the request for the page is located as well as where the page returned is processed.
Ok, I wasn't sure as your description lacked details.
Assuming you start with Squid 3 (still in development, CVS) then you need to deal with ClientStreams. These are horribly underdocumented[1], but you can follow the example from the ESI code. (Or my code[2], so far as it works.)
When the data comes in from the origin server, a chain of observers is called to process the data. You need to implement four functions: bufferData, streamRead, streamDetach and streamStatus.[3] Then you install your stream node in client_side_reply.cc:1926 (right above the "#if ESI" line.)
You will need to implement two client stream nodes, one to process the returned page, and one to fetch with.
I expect that you'll run into the same problems that I do. My page fetching is implemented in PrefetchStream.cc. You'll need your own page fetcher; feel free to copy mine, and send me patches for any bugs you might fix.
Squid isn't smart enough to collapse incoming requests. So if you go out and fetch URL X, then a client request comes in for URL X before the origin server responds, squid will place a second request on the origin server. It will wait for that second one to complete before sending the data back to the client. I haven't dealt with this yet and we'll both need a solution. There is a branch called "collapsed forwarding"[4], but it currently only applies to accelerator setups. It could perhaps be adapted.
Good luck.
Nick Lewycky
[1] - http://squidwiki.kinkie.it/squidwiki/ClientStreams [2] - http://devel.squid-cache.org/cgi-bin/diff/prefetching [3] - http://www.squid-cache.org/Doc/Prog-Guide/prog-guide-8.html#ss8.3 [4] - http://devel.squid-cache.org/projects.html#collapsed_forwarding
