Aled Jones wrote:
Hi
Is there a way to remove certain urls from a crawled set of data?
Please see the PruneIndexTool. This removes just the index entries,
without actually removing the content from segments. This means that you
will no longer see the hits from these urls, but it doesn't prevent you
from collecting the same urls in the next round of fetching. To prevent
that, you need to modify your URLFilters.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general