Andrzej Bialecki wrote: > Reading the other day the searchenginewatch forum I came to conclusion > that currently Nutch is rather careless about the bandwidth
To be really economic with bandwidth, the search engine should only fetch enough information to present as search hits. Instead of just registering if the page has changed (and how often), it could also register how often the page has been showed in a query hit list. If all users only query for topics in metallurgy, it is quite useless to fetch new versions of a page on entomology (assuming that the page will stay on topic). Especially with a do-it-yourself search engine like Nutch, I would guess there are many applications that target small user communities with a narrow focus. However, updating the database for every search query might be more expensive than fetching a few more pages. It depends on how many you have of each kind. -- Lars Aronsson ([EMAIL PROTECTED]) Aronsson Datateknik - http://aronsson.se/ ------------------------------------------------------- This SF.Net email sponsored by Black Hat Briefings & Training. Attend Black Hat Briefings & Training, Las Vegas July 24-29 - digital self defense, top technical experts, no vendor pitches, unmatched networking opportunities. Visit www.blackhat.com _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers
