I figured as much, which is why I'm not sure why it's not working for me. I ran bin/nutch org.apache.nutch.net.URLFilterChecker http://myserver/myurland it's been thirty minutes with no results.
Is there something I should run before running that? Thanks. On Wed, Jun 12, 2013 at 8:34 AM, Markus Jelsma <markus.jel...@openindex.io>wrote: > We happily use that filter just as it is shipped with Nutch. Just enabling > it in plugin.includes works for us. To ease testing you can use the > bin/nutch org.apache.nutch.net.URLFilterChecker to test filters. > > > -----Original message----- > > From:Bai Shen <baishen.li...@gmail.com> > > Sent: Wed 12-Jun-2013 14:32 > > To: user@nutch.apache.org > > Subject: Suffix URLFilter not working > > > > I'm dealing with a lot of file types that I don't want to index. I was > > originally using the regex filter to exclude them but it was getting out > of > > hand. > > > > I changed my plugin includes from > > > > urlfilter-regex > > > > to > > > > urlfilter-(regex|suffix) > > > > I've tried using both the default urlfilter-suffix.txt file via adding > the > > extensions I don't want and making my own file that starts with + and > > includes the extensions I do want. > > > > Neither of these approaches seem to work. I continue to get urls added > to > > the database which continue extensions I don't want. Even adding a > > urlfilter.order section to my nutch-site.xml doesn't work. > > > > I don't see any obvious bugs in the code, so I'm a bit stumped. Any > > suggestions for what else to look at? > > > > Thanks. > > >