Sorry.  I forgot to mention that I'm running a 2.x release taken from a few
weeks ago.


On Wed, Jun 12, 2013 at 8:31 AM, Bai Shen <[email protected]> wrote:

> I'm dealing with a lot of file types that I don't want to index.  I was
> originally using the regex filter to exclude them but it was getting out of
> hand.
>
> I changed my plugin includes from
>
> urlfilter-regex
>
> to
>
> urlfilter-(regex|suffix)
>
> I've tried using both the default urlfilter-suffix.txt file via adding the
> extensions I don't want and making my own file that starts with + and
> includes the extensions I do want.
>
> Neither of these approaches seem to work.  I continue to get urls added to
> the database which continue extensions I don't want.  Even adding a
> urlfilter.order section to my nutch-site.xml doesn't work.
>
> I don't see any obvious bugs in the code, so I'm a bit stumped.  Any
> suggestions for what else to look at?
>
> Thanks.
>

Reply via email to