Tellling Nutch to skip certain Url

Nemani, Raj Sun, 22 Aug 2010 20:42:30 -0700

All,


I am currently using Nutch to crawl an intranet site.  I start the crawl
with one seed url as shown below.

 

http://mysite <http://mysite/> .
Mydomain.com/guidance/wiki/index.php/sylebook.

 

What I would like to do is to tell Nutch to skip all that URLS that do
not conform to the following the pattern

 

http://mysite <http://mysite/> . Mydomain.com/guidance/........

 

Can anyone please help me with this issue?

I appreciate your help

 

Thanks

Raj

Tellling Nutch to skip certain Url

Reply via email to