Re: Can't Crawl Through Home Page, but crawling through inner page

[email protected] Wed, 02 Mar 2011 05:22:18 -0800

Thanks to all

I did following changes and it worked :-)


crawl-urlfilter.txt
# skip URLs containing certain characters as probable queries, etc.
#-[?*!@=]

# accept hosts in MY.DOMAIN.NAME
+^*magicbricks.com*


regex-urlfilter.txt
# skip URLs containing certain characters as probable queries, etc.
#-[?*!@=]

# accept hosts in MY.DOMAIN.NAME
-^*magicbricks.com*


nutch-default.xml

  http.redirect.max
  0
  The maximum number of redirects the fetcher will follow when
  trying to fetch a page. If set to negative or 0, fetcher won't immediately
  follow redirected URLs, instead it will record them for later fetching.
  



Thanks Again
Hemant Verma

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-t-Crawl-Through-Home-Page-but-crawling-through-inner-page-tp2601843p2611857.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Can't Crawl Through Home Page, but crawling through inner page

Reply via email to