I figured as much, which is why I'm not sure why it's not working for me.

I ran bin/nutch org.apache.nutch.net.URLFilterChecker
http://myserver/myurland it's been thirty minutes with no results.

Is there something I should run before running that?

Thanks.


On Wed, Jun 12, 2013 at 8:34 AM, Markus Jelsma
<markus.jel...@openindex.io>wrote:

> We happily use that filter just as it is shipped with Nutch. Just enabling
> it in plugin.includes works for us. To ease testing you can use the
> bin/nutch org.apache.nutch.net.URLFilterChecker to test filters.
>
>
> -----Original message-----
> > From:Bai Shen <baishen.li...@gmail.com>
> > Sent: Wed 12-Jun-2013 14:32
> > To: user@nutch.apache.org
> > Subject: Suffix URLFilter not working
> >
> > I'm dealing with a lot of file types that I don't want to index.  I was
> > originally using the regex filter to exclude them but it was getting out
> of
> > hand.
> >
> > I changed my plugin includes from
> >
> > urlfilter-regex
> >
> > to
> >
> > urlfilter-(regex|suffix)
> >
> > I've tried using both the default urlfilter-suffix.txt file via adding
> the
> > extensions I don't want and making my own file that starts with + and
> > includes the extensions I do want.
> >
> > Neither of these approaches seem to work.  I continue to get urls added
> to
> > the database which continue extensions I don't want.  Even adding a
> > urlfilter.order section to my nutch-site.xml doesn't work.
> >
> > I don't see any obvious bugs in the code, so I'm a bit stumped.  Any
> > suggestions for what else to look at?
> >
> > Thanks.
> >
>

Reply via email to