You should be able to do this using one of the variations of *-urlfilter.txt
files. Instead of using + in front of the regex, you can tell it to
exclude lines that match the regex with a -.
Just a guess, I haven't actually tried it, but you could probably use
something like the following. (I'm
But how could we tell Nutch every time to do crawling in this way?
I do not want to edit *-filter.txt every time.
Thanks,
Jun
-Original Message-
From: Jesse Hires [mailto:jhi...@gmail.com]
Sent: 2009年12月22日 9:23
To: nutch-user@lucene.apache.org
Subject: Re: domain crawl using bin