We happily use that filter just as it is shipped with Nutch. Just enabling it
in plugin.includes works for us. To ease testing you can use the bin/nutch
org.apache.nutch.net.URLFilterChecker to test filters.
-----Original message-----
> From:Bai Shen <[email protected]>
> Sent: Wed 12-Jun-2013 14:32
> To: [email protected]
> Subject: Suffix URLFilter not working
>
> I'm dealing with a lot of file types that I don't want to index. I was
> originally using the regex filter to exclude them but it was getting out of
> hand.
>
> I changed my plugin includes from
>
> urlfilter-regex
>
> to
>
> urlfilter-(regex|suffix)
>
> I've tried using both the default urlfilter-suffix.txt file via adding the
> extensions I don't want and making my own file that starts with + and
> includes the extensions I do want.
>
> Neither of these approaches seem to work. I continue to get urls added to
> the database which continue extensions I don't want. Even adding a
> urlfilter.order section to my nutch-site.xml doesn't work.
>
> I don't see any obvious bugs in the code, so I'm a bit stumped. Any
> suggestions for what else to look at?
>
> Thanks.
>