We happily use that filter just as it is shipped with Nutch. Just enabling it 
in plugin.includes works for us. To ease testing you can use the bin/nutch 
org.apache.nutch.net.URLFilterChecker to test filters.
 
 
-----Original message-----
> From:Bai Shen <[email protected]>
> Sent: Wed 12-Jun-2013 14:32
> To: [email protected]
> Subject: Suffix URLFilter not working
> 
> I'm dealing with a lot of file types that I don't want to index.  I was
> originally using the regex filter to exclude them but it was getting out of
> hand.
> 
> I changed my plugin includes from
> 
> urlfilter-regex
> 
> to
> 
> urlfilter-(regex|suffix)
> 
> I've tried using both the default urlfilter-suffix.txt file via adding the
> extensions I don't want and making my own file that starts with + and
> includes the extensions I do want.
> 
> Neither of these approaches seem to work.  I continue to get urls added to
> the database which continue extensions I don't want.  Even adding a
> urlfilter.order section to my nutch-site.xml doesn't work.
> 
> I don't see any obvious bugs in the code, so I'm a bit stumped.  Any
> suggestions for what else to look at?
> 
> Thanks.
> 

Reply via email to