Hello ,

Nutch is trying to crawl everything, including DLL, EXE and all
non-textual formats. How to limit nutch to only some desirable
content-types? I know it's possible to do this by editing urlfilter
plugin settings, but it's hard to predetermine all the possible
extensions and this technique is unreliable.
Is it possible to limit crawler to fetch only some definite
content-types or at least have only them indexed?


Reply via email to