Tom, You can test your filters using *./nutch plugin urlfilter-regex org.apache.nutch.urlfilter.regex.RegexURLFilter* then enter a URL to check whether it is filtered or not
If you are in a distributed environment the filters in the conf dir of your master are not used : you need to regenerate a job file as it is what the slaves use HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com On 19 May 2010 16:05, Tom Landvoigt <[email protected]> wrote: > Hi, > > > > I have a little problem. > > > > In my crawldb are urls like > http://blog2.de/fotos/tags/080807/photo/1150136437/DSC0717.html but I > don't want to crawl them. > > > > So I put a line in my regex-urlfilter.txt: > > > > -^http://blog2.de/fotos/tags/ > > > > But when I generate a segment the url is still in it. Can someone help > me with this? > > > > Thanks a lot > > > > --------------------- > > Tom Landvoigt > > > >

