Hi, I`m looking at the code of the fetcher and have the following question: why does the fetcher do more than fetching? Wouldn`t it be better te move the page parsing to another component and let the fetcher only fetch? (so the fetch threads only do fetching). Another problem with this threaded approach is that you need a lot of threads because a single thread is responsible for retrieving data and also for parsing it. If you remove the parsing part, a thread would only be responsible for fetching. And this makes it possible to use a single thread in the Fetcher that gathers data from a lot of sockets (and this reduces context switching overhead). This is a technique widely used in search engines and I`m curious about why Nutch goes for a different approach.
Met vriendelijke groet, Peter Veentjer Anchor Men Interactive Solutions - duidelijk in zakelijke internetoplossingen Praediniussingel 41 9711 AE Groningen T: 050-3115222 F: 050-5891696 E: [EMAIL PROTECTED] I : www.anchormen.nl <blocked::http://www.anchormen.nl/>
