All,

 

I am currently using Nutch to crawl an intranet site.  I start the crawl
with one seed url as shown below.

 

http://mysite <http://mysite/> .
Mydomain.com/guidance/wiki/index.php/sylebook.

 

What I would like to do is to tell Nutch to skip all that URLS that do
not conform to the following the pattern

 

http://mysite <http://mysite/> . Mydomain.com/guidance/........

 

Can anyone please help me with this issue?

I appreciate your help

 

Thanks

Raj

 

 

 

Reply via email to