When someone uses the crawl method with, lets say 100 hundred sites, you
establish your url filters to allow only those sites.  In the first run,
just those 100 sites are indexed, but in subsequent runs,  the outlinks
are indexed too, together with other hops of the seeds sites.  This is
fine, as someone gets some really good related sites,  but if those
sites do not comply with the url filter, how come are they indexed? 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to