Tom,

You can test your filters using
*./nutch plugin urlfilter-regex
org.apache.nutch.urlfilter.regex.RegexURLFilter*
then enter a URL to check whether it is filtered or not

If you are in a distributed environment the filters in the conf dir of your
master are not used : you need to regenerate a job file as it is what the
slaves use

HTH

Julien

-- 
DigitalPebble Ltd
http://www.digitalpebble.com

On 19 May 2010 16:05, Tom Landvoigt <[email protected]> wrote:

> Hi,
>
>
>
> I have a little problem.
>
>
>
> In my crawldb are urls like
> http://blog2.de/fotos/tags/080807/photo/1150136437/DSC0717.html but I
> don't want to crawl them.
>
>
>
> So I put a line in my regex-urlfilter.txt:
>
>
>
> -^http://blog2.de/fotos/tags/
>
>
>
> But when I generate a segment the url is still in it. Can someone help
> me with this?
>
>
>
> Thanks a lot
>
>
>
> ---------------------
>
> Tom Landvoigt
>
>
>
>

Reply via email to