Hello , Nutch is trying to crawl everything, including DLL, EXE and all non-textual formats. How to limit nutch to only some desirable content-types? I know it's possible to do this by editing urlfilter plugin settings, but it's hard to predetermine all the possible extensions and this technique is unreliable. Is it possible to limit crawler to fetch only some definite content-types or at least have only them indexed?
- content-type crawling problem Eugen Kochuev
- Re: content-type crawling problem Eugen Kochuev
- Re: content-type crawling problem Heiko Dietze
- Re[2]: content-type crawling problem Eugen Kochuev
- Re[2]: content-type crawling problem Eugen Kochuev
- Re: content-type crawling problem Heiko Dietze
- Re: content-type crawling problem Stefan Neufeind
- FieldQueryFilter vs RawFieldQuer... Bogdan Kecman
