Hi Mauro, Have a look at the domain filter plugin in the SVN version of the code. It will allow you to filter based on the TLD.
HTH Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2009/3/19 Mauro Vignati <vig...@gmail.com> > Hi, > I'm testing Nutch and until now everything works fine (ok, some hours spent > in reading, testing, testing and testing but it's normal. > I have a noob question: I have to crawl websites only within a ccTLD. > > In the crawl-urlfilter.txt should I wright so: > > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*.ch/ > > > or so > > # accept hosts in MY.DOMAIN.NAME > +^http://([a-z0-9]*\.)*ch/ > > > The difference is the dot before the "ch" ccTLD. I mean, the dot before the > bracket is already dividing the ccTLD and the name (or the root and a > subdomain) or sould I add one like in the first exemple? In the > installation > guide I can see: > > +^http://([a-z0-9]*\.)*apache.org/ > > Is crawling every subdomain of apache.org (xxx.apache.org) or is > crawling apache.org? > > Many thanks for any help > Mauro >