Prune is ok to remove the docs from the index, but it will not prevent
the pages from being refetched, so you might also want to change the
regex-urlfilter (or crawl-ulrfilter if you are usign the crawltool)
for that purpose.

Rgrds,. Thomas

On 6/22/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
Hi,Rajesh.

Use "prune" tool.
./nutch prune /path/to/segments/dir /path/to/file/with/rules

You wrote 21 èþíÿ 2006 ã., 20:35:34:

> I would like to delete certain documents from the crawled documents
> depending on a certain criteria. Is there a way to achieve this? My guess
> is, nutch downloads all the files before parsing it.


> __________ NOD32 1.1611 (20060620) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




--
Regards,
 Dima                          mailto:[EMAIL PROTECTED]


Reply via email to