Peter Swoboda wrote:
> Hi,
> what does the property
> 
> <property>
>   <name>db.default.fetch.interval</name>
>   <value>30</value>
>   <description>The default number of days between re-fetches of a page.
>   </description>
> </property>
> 
> exactly do?

Urls in the CrawlDb are set to be refetched after a given interval.  The 
default is 30 days. This variable set the interval.

> Does it mean, that any changes on an injected url will be mentioned?
> Who(?)What re-fetches the page?

Fetcher will once the interval has expired.  This does not happen 
automatically, a fetch job will have to be run.

> What did i have to do, if i want nutch to mention (in the search results) 
> that an injected url is changed.
> Do i have to make a complete recrawl (like in the script)?

If you know specific urls have changed, you can create a fetch list  of 
only those urls (through a separate crawldb and segments on a separate 
inject, generate, fetch process...don't use the same path)  Then you can 
merge those results using mergedb for the CrawlDb and mergesegs for the 
Segments.  You should have to do a full recrawl unless you don't know 
what pages were changed.

Dennis Kubes
> 
> Thanks
> Peter
> 
> 
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to