Hi,Rajesh.

Use "prune" tool.
./nutch prune /path/to/segments/dir /path/to/file/with/rules

You wrote 21 èþíÿ 2006 ã., 20:35:34:

> I would like to delete certain documents from the crawled documents
> depending on a certain criteria. Is there a way to achieve this? My guess
> is, nutch downloads all the files before parsing it.


> __________ NOD32 1.1611 (20060620) Information __________

> This message was checked by NOD32 antivirus system.
> http://www.eset.com




-- 
Regards,
 Dima                          mailto:[EMAIL PROTECTED]


All the advantages of Linux Managed Hosting--Without the Cost and Risk!
Fully trained technicians. The highest number of Red Hat certifications in
the hosting industry. Fanatical Support. Click to learn more
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=107521&bid=248729&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to