If you want to restrict this one domain, you also need to add

-.

after, to remove any other domain.

Kevin

On Fri, 2008-06-27 at 00:17 +0530, kranthi reddy wrote:
> hi siddhartha,
> 
>    I am doing a whole web crawl.
> 
>   Thanks for the response...i am now able to restrict the search to  a
> particular domain.For this i have changed the db.ignore.external.links to
> true...
> 
> But when i put the value to false and
> try changing the conf/regex-urlfilter.txt  by adding the line ...
> 
>      "+^http://([a-z0-9]*\.)*ibnlive.com/"
>  It doesn't work...
> 
> 
> Thank you
> Kranthi Reddy.B
> 
> On Thu, Jun 26, 2008 at 11:54 PM, Siddhartha Reddy <[EMAIL PROTECTED]> wrote:
> 
> > Hi Kranthi,
> >
> > Are you doing an intranet crawl (using the "bin/nutch crawl" command) or a
> > whole-web crawl (using the various other sub-commands of bin/nutch, for
> > example)? conf/crawl-urlfilter.txt is used only in the intranet crawl, you
> > need use conf/regex-urlfilter.txt otherwise.
> >
> > Another effective way of restricting a crawl to the domains from the seed
> > list is to set the db.ignore.external.links property to true in
> > conf/nutch-site.xml. conf/nutch-default.xml includes a description of this
> > property.
> >
> > Best,
> > Siddhartha
> >
> > On Thu, Jun 26, 2008 at 11:31 PM, kranthi reddy <[EMAIL PROTECTED]>
> > wrote:
> >
> > > Hi ,
> > >
> > >  I am trying to crawl a fixed domain ... say IBNLIVE.COM ...
> > >
> > >  I have changed my conf/crawl-urlfilter.txt . I have added the line
> > >
> > >  "+^http://([a-z0-9]*\.)*ibnlive.com/ "
> > >
> > >
> > >   But i dont wat is going on ... i get results like
> > >
> > >  "fetching http://www.google-analytics.com/urchin.js
> > >   fetching http://www.josh18.com/showstory.php?id=236481
> > >   fetching
> > >
> > >
> > http://www.cricketnext.com/news/gambhir-raina-make-merry-as-bowlers-struggle/32395-13.html
> > > "
> > >
> > >
> > >   I have given it in the format specified in the wiki/nutch site....
> > >   But it doesn't seem to work...
> > >
> > >  Some one please help me out...
> > >
> > > Thanking you
> > > kranthi reddy.b
> > >
> >
> > --
> > http://www.grok.in
> > "Ignorance killed the cat, curiosity was framed."
> >

Reply via email to