On fre, 2008-05-23 at 15:47 +0200, Cezary Rzewuski wrote: > I've looked into squid's code and I've got an idea how to do this. The > best place to scan downloaded site seems to be storeSwapOutFileClosed > function in store_swapout.cc file. After closing file clamav could scan > this file and log if the site is malicious.
There is a simpler way. Use Squid-3 and the c-icap project. This allows you to plug in ClamAV to Squid quite seamless, and works for all content even those that are not getting cached. And if your crawler offloads SSL to the proxy then it will happily scan even https content for you. (such offloading is done by sending https:// URLs to the proxy, without wrapping them in SSL) Squid-3: http://www.squid-cache.org/Versions/v3/3.0/ C-ICAP: http://c-icap.sourceforge.net/ Install instructions including Squid-3 configuration details: http://c-icap.sourceforge.net/install.html Regards Henrik
signature.asc
Description: This is a digitally signed message part
