All,
I am currently using Nutch to crawl an intranet site. I start the crawl with one seed url as shown below. http://mysite <http://mysite/> . Mydomain.com/guidance/wiki/index.php/sylebook. What I would like to do is to tell Nutch to skip all that URLS that do not conform to the following the pattern http://mysite <http://mysite/> . Mydomain.com/guidance/........ Can anyone please help me with this issue? I appreciate your help Thanks Raj

