Re: domain crawl using bin/nutch

2009-12-21 Thread Jesse Hires
You should be able to do this using one of the variations of *-urlfilter.txt files. Instead of using + in front of the regex, you can tell it to exclude lines that match the regex with a -. Just a guess, I haven't actually tried it, but you could probably use something like the following. (I'm

RE: domain crawl using bin/nutch

2009-12-21 Thread Jun Mao
But how could we tell Nutch every time to do crawling in this way? I do not want to edit *-filter.txt every time. Thanks, Jun -Original Message- From: Jesse Hires [mailto:jhi...@gmail.com] Sent: 2009年12月22日 9:23 To: nutch-user@lucene.apache.org Subject: Re: domain crawl using bin