I would like to delete certain documents from the crawled documents depending on a certain criteria. Is there a way to achieve this? My guess is, nutch downloads all the files before parsing it.
_______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
