Peter Swoboda wrote: > Hi, > what does the property > > <property> > <name>db.default.fetch.interval</name> > <value>30</value> > <description>The default number of days between re-fetches of a page. > </description> > </property> > > exactly do?
Urls in the CrawlDb are set to be refetched after a given interval. The default is 30 days. This variable set the interval. > Does it mean, that any changes on an injected url will be mentioned? > Who(?)What re-fetches the page? Fetcher will once the interval has expired. This does not happen automatically, a fetch job will have to be run. > What did i have to do, if i want nutch to mention (in the search results) > that an injected url is changed. > Do i have to make a complete recrawl (like in the script)? If you know specific urls have changed, you can create a fetch list of only those urls (through a separate crawldb and segments on a separate inject, generate, fetch process...don't use the same path) Then you can merge those results using mergedb for the CrawlDb and mergesegs for the Segments. You should have to do a full recrawl unless you don't know what pages were changed. Dennis Kubes > > Thanks > Peter > > > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys-and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
