If you want to restrict this one domain, you also need to add -.
after, to remove any other domain. Kevin On Fri, 2008-06-27 at 00:17 +0530, kranthi reddy wrote: > hi siddhartha, > > I am doing a whole web crawl. > > Thanks for the response...i am now able to restrict the search to a > particular domain.For this i have changed the db.ignore.external.links to > true... > > But when i put the value to false and > try changing the conf/regex-urlfilter.txt by adding the line ... > > "+^http://([a-z0-9]*\.)*ibnlive.com/" > It doesn't work... > > > Thank you > Kranthi Reddy.B > > On Thu, Jun 26, 2008 at 11:54 PM, Siddhartha Reddy <[EMAIL PROTECTED]> wrote: > > > Hi Kranthi, > > > > Are you doing an intranet crawl (using the "bin/nutch crawl" command) or a > > whole-web crawl (using the various other sub-commands of bin/nutch, for > > example)? conf/crawl-urlfilter.txt is used only in the intranet crawl, you > > need use conf/regex-urlfilter.txt otherwise. > > > > Another effective way of restricting a crawl to the domains from the seed > > list is to set the db.ignore.external.links property to true in > > conf/nutch-site.xml. conf/nutch-default.xml includes a description of this > > property. > > > > Best, > > Siddhartha > > > > On Thu, Jun 26, 2008 at 11:31 PM, kranthi reddy <[EMAIL PROTECTED]> > > wrote: > > > > > Hi , > > > > > > I am trying to crawl a fixed domain ... say IBNLIVE.COM ... > > > > > > I have changed my conf/crawl-urlfilter.txt . I have added the line > > > > > > "+^http://([a-z0-9]*\.)*ibnlive.com/ " > > > > > > > > > But i dont wat is going on ... i get results like > > > > > > "fetching http://www.google-analytics.com/urchin.js > > > fetching http://www.josh18.com/showstory.php?id=236481 > > > fetching > > > > > > > > http://www.cricketnext.com/news/gambhir-raina-make-merry-as-bowlers-struggle/32395-13.html > > > " > > > > > > > > > I have given it in the format specified in the wiki/nutch site.... > > > But it doesn't seem to work... > > > > > > Some one please help me out... > > > > > > Thanking you > > > kranthi reddy.b > > > > > > > -- > > http://www.grok.in > > "Ignorance killed the cat, curiosity was framed." > >
