Michael, Thanks for the heads-up about db.max.outlinks.per.page property. I'll have to go back and check my logs etc.
Regards, Paul. -----Original Message----- From: Michael Nebel [mailto:[EMAIL PROTECTED] Sent: 16 September 2005 08:31 To: [email protected] Subject: Re: Whole web search depth Hi Paul, in the first iteration of the crawl, you get the start-pages. While parsing the document (done autmatically by "nutch fetch"), the outlinks are identified. An outlink is every link - including internal links. With the updatedb, the are injected to the webdb and "nutch generate" adds them to the partition. Perhaps you can check with "nutch fetchlist". I'm pretty shure about this part, because I see more than the startpages :-) I don't know, what "db.ignore.internal.links" is for. I would guess it's used in the context of the analyze. The more interesting parameter for you should be "db.max.outlinks.per.page", because this limits the number of outlinks used by a page. Regards Michael Paul Williams wrote: > Michael, > > Thanks for the reply. I guess what I'm really asking for is how do I > crawl more than just the home page of a site? Looking at > nutch-default.xml there is a property named db.ignore.internal.links, so > do I just say false here and get more in depth searching? > > Thanks for an advice. > Paul. > > -----Original Message----- > From: Michael Nebel [mailto:[EMAIL PROTECTED] > Sent: 14 September 2005 10:05 > To: [email protected] > Subject: Re: Whole web search depth > > Hi Paul, > > just call the "generate - fetch - updatedb" loop as often as you want. > :-) > > Perhaps the parameter "depth" is the wrong name and causes the > confusion. Depth does not mean, that the crawler follows one link to a > depth of x and then takes the next link. Depth does mean the number of > times, the loop "generate - fetch - updatedb" is done. Just take a look > at output of the crawl. The result of calling the loop is (should be) > the same as if you follow one link to the depth of x! > > Regards > > Michael > > Paul Williams wrote: > > >>Hi, >> >> >> >>I'm fairly new to using Nutch and so this is probably a newbie > > question > >>(I've already looked in the mailing lists and can't see an answer). >> >> >> >>I'm trying to do a web search (limited to around 10 sites at the > > moment) > >>but I'm unsure on how to set the depth of searching. How is this > > done? > >> >> >> >> >>Cheers. >> >> > > > -- Michael Nebel http://www.nebel.de/ http://www.netluchs.de/
