We have the settings like ; *in Crawl-urlfilter ; * +^http://test.mydomain.com
*in regex-urlfilter : * +^http://test.mydomain.com *in seed.txt : * http://test.mydomian.com/enGB/ProductLanding/Products.html What you say made me realize that since I have the full urls in these configuration , the crawler drop the relative urls as well ? Is this what you mean ? I thought there would be a setting.. Can you give an example of the combination in which case they should be working ? Thanks On Tue, Sep 21, 2010 at 4:42 PM, Thumuluri, Sai < [email protected]> wrote: > Did you check regex-url and crawl filters in nutch conf to make sure you > are not excluding the relative URLs? > > -----Original Message----- > From: Bahadir Cambel [mailto:[email protected]] > Sent: Tuesday, September 21, 2010 10:35 AM > To: [email protected] > Subject: Relative urls are not crawled ? > > Hey Guys , > > Our website constructed using the relative URLs like the menu links are > "/Products/default.html" , "/Brands/default.html" > > Once Nutch crawl the website , I cannot see that these anchors are > fetched > although I set the depth to 2. The end result index only contain 1 > document. > > If I run it against e.g http://androidyou.blogspot.com , I can see the > other > URLs are fetched as well, and you can see that the links are full urls > in > the web site. > > Is there any configuration exists for this ? > > Hope I had able to tell the issue clearly.. > > Kind regards , > Bahadir Cambel >

