Thank you for answering, John:

That property won't help me, when that property is setted to false it
only updates information about the injected URLs, but it doesn't add
the information recovered in the cycle generate-fetch-update, so a
depth-5 crawling with this property setted to false is like five
depth-1 crawlings. I'm looking for a way to say to the crawl database
that it only generates pages from the injected URL's and the URL's
reached in that crawl operation, and when a new crawling is launched
it starts again to cycle only from the injected URL's and expanding in
the desired depth from this seed URL's.

Again, thank you for answering.

Ismael

2007/8/25, John Mendenhall <[EMAIL PROTECTED]>:
> > In the first crawl i have no problems, but when I recrawl in my crawl
> > database there are pages and links from the previous operation, so if
> > I first crawl with depth 1 and later I recrawl with depth 1 again is
> > like a depth 2 crawling. From an example:
> >
> > I make a depth-1 crawling on www.fgfgfgfgfgfgf.com ; it recovers
> > information from that page and in that information there is a link to
> > www.vbvbvbvbvbvbvbvb.com.   When I recrawl with depth 1 again it
> > recrawls from the first web and from the second one, that was added in
> > the first crawl. So this is like I made a depth-2 crawling on the
> > first web, not a depth-1 recrawling.
>
> I think you are looking for this property setting:
>
> <property>
>   <name>db.update.additions.allowed</name>
>   <value>false</value>
>   <description>If true, updatedb will add newly discovered URLs, if false
>   only already existing URLs in the CrawlDb will be updated and no new
>   URLs will be added.
>   </description>
> </property>
>
> I hope that helps.
>
> JohnM
>
> --
> john mendenhall
> [EMAIL PROTECTED]
> surf utopia
> internet services
>

Reply via email to