RE: Relative urls are not crawled ?

Thumuluri, Sai Tue, 21 Sep 2010 07:42:55 -0700

Did you check regex-url and crawl filters in nutch conf to make sure you
are not excluding the relative URLs?

-----Original Message-----
From: Bahadir Cambel [mailto:[email protected]] 
Sent: Tuesday, September 21, 2010 10:35 AM
To: [email protected]
Subject: Relative urls are not crawled ?

Hey Guys ,

Our website constructed using the relative URLs like the menu links are
"/Products/default.html" , "/Brands/default.html"

Once Nutch crawl the website , I cannot see that these anchors are
fetched
although I set the depth to 2. The end result index only contain 1
document.

If I run it against e.g http://androidyou.blogspot.com , I can see the
other
URLs are fetched as well, and you can see that the links are full urls
in
the web site.

Is there any configuration exists for this ?

Hope I had able to tell the issue clearly..

Kind regards ,
Bahadir Cambel

RE: Relative urls are not crawled ?

Reply via email to