Any information on this? I really need to limit nutch in indexing
(only textual formats, excluding css, javascript and other non human
oriented data)

> Nutch is trying to crawl everything, including DLL, EXE and all
> non-textual formats. How to limit nutch to only some desirable
> content-types? I know it's possible to do this by editing urlfilter
> plugin settings, but it's hard to predetermine all the possible
> extensions and this technique is unreliable.
> Is it possible to limit crawler to fetch only some definite
> content-types or at least have only them indexed?

Reply via email to