I would like to delete certain documents from the crawled documents
depending on a certain criteria. Is there a way to achieve this? My guess
is, nutch downloads all the files before parsing it.

Reply via email to