I noticed the same thing that the outlinks are fetched during subsequent runs even though you have URLfilters in place.
-byron --- carmmello <[EMAIL PROTECTED]> wrote: > When someone uses the crawl method with, lets say > 100 hundred sites, you > establish your url filters to allow only those > sites. In the first run, > just those 100 sites are indexed, but in subsequent > runs, the outlinks > are indexed too, together with other hops of the > seeds sites. This is > fine, as someone gets some really good related > sites, but if those > sites do not comply with the url filter, how come > are they indexed? > > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
