Aled Jones wrote:

Hi

Is there a way to remove certain urls from a crawled set of data?

Please see the PruneIndexTool. This removes just the index entries, without actually removing the content from segments. This means that you will no longer see the hits from these urls, but it doesn't prevent you from collecting the same urls in the next round of fetching. To prevent that, you need to modify your URLFilters.

--
Best regards,
Andrzej Bialecki     <><
___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com


Reply via email to