Prune is ok to remove the docs from the index, but it will not prevent the pages from being refetched, so you might also want to change the regex-urlfilter (or crawl-ulrfilter if you are usign the crawltool) for that purpose.
Rgrds,. Thomas On 6/22/06, Dima Mazmanov <[EMAIL PROTECTED]> wrote: > Hi,Rajesh. > > Use "prune" tool. > ./nutch prune /path/to/segments/dir /path/to/file/with/rules > > You wrote 21 èþíÿ 2006 ã., 20:35:34: > > > I would like to delete certain documents from the crawled documents > > depending on a certain criteria. Is there a way to achieve this? My guess > > is, nutch downloads all the files before parsing it. > > > > __________ NOD32 1.1611 (20060620) Information __________ > > > This message was checked by NOD32 antivirus system. > > http://www.eset.com > > > > > -- > Regards, > Dima mailto:[EMAIL PROTECTED] > > Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
