First of all thanks Julien, I solved the problem with your help.
But I still have a question. I have to rebuild the .job file with ant right? The rebuilding only uses the regex-urlfilter.txt in /conf and don't uses the NUTCH_CONF_DIR variable to get the conf dir. Is that right? Thanks Tom -----Original Message----- From: Julien Nioche [mailto:[email protected]] Sent: Mittwoch, 19. Mai 2010 17:24 To: [email protected] Subject: Re: Regex urlfilter Tom, You can test your filters using *./nutch plugin urlfilter-regex org.apache.nutch.urlfilter.regex.RegexURLFilter* then enter a URL to check whether it is filtered or not If you are in a distributed environment the filters in the conf dir of your master are not used : you need to regenerate a job file as it is what the slaves use HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com On 19 May 2010 16:05, Tom Landvoigt <[email protected]> wrote: > Hi, > > > > I have a little problem. > > > > In my crawldb are urls like > http://blog2.de/fotos/tags/080807/photo/1150136437/DSC0717.html but I > don't want to crawl them. > > > > So I put a line in my regex-urlfilter.txt: > > > > -^http://blog2.de/fotos/tags/ > > > > But when I generate a segment the url is still in it. Can someone help > me with this? > > > > Thanks a lot > > > > --------------------- > > Tom Landvoigt > > > >

