Prune is ok to remove the docs from the index, but it will not prevent
the pages from being refetched, so you might also want to change the
regex-urlfilter (or crawl-ulrfilter if you are usign the crawltool)
for that purpose.

Rgrds,. Thomas

On 6/22/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote:
> Hi,Rajesh.
>
> Use "prune" tool.
> ./nutch prune /path/to/segments/dir /path/to/file/with/rules
>
> You wrote 21 èþíÿ 2006 ã., 20:35:34:
>
> > I would like to delete certain documents from the crawled documents
> > depending on a certain criteria. Is there a way to achieve this? My guess
> > is, nutch downloads all the files before parsing it.
>
>
> > __________ NOD32 1.1611 (20060620) Information __________
>
> > This message was checked by NOD32 antivirus system.
> > http://www.eset.com
>
>
>
>
> --
> Regards,
>  Dima                          mailto:[EMAIL PROTECTED]
>
>

Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to