The point of this is to make it so that you can generate a fetchlist, start fetching it, then generate another before you've updated the database with output of the first. Otherwise the second fetchlist would contain the same pages as the first, as they'd still be due to be fetched. If you don't update the database with the output of the first fetch within a week, then they will be re-generated into a new fetchlist. But, if you update with the output of the first fetch within a week then the pages' next-fetch date will be reset to the value it would normally have (the current fetch date + the page's fetch interval).

In other words, this lets you concurrently fetch and update the database. (The seven-day constant should really be a config parameter, and it should probably never be less than the default fetch interval...)

Doug

Sean Lee wrote:
in line 540 of java.net.nutch.tools.FetchListTool.java,

it comments that:

/ "//Modify the Page in the webdb so that its date is set forward a week. This way, we can have generate two consecutive different fetchlists without an intervening update"/

My first Question is, what is "two consecutive different fetchlists" referring to exactly? Is it referring to the data structures in db/webdb/... and segments/200405xxxxxx/fetchlist/data?

And, my second question is how exactly is the time stamp plays a role in achieving that non-intervening operations.

Thank you,



~ Sean Lee
------------------------------------------------------- This SF.Net email is sponsored by Sleepycat Software Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver higher performing products faster, at low TCO. http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3 _______________________________________________ Nutch-developers mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-developers


-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to