Thank you Doug,
but I think I wasn't lucid enough in asking my first question.
My question was where are those two lists stored? I know that one of the lists is stored in db/webdb/ Where is the other list stored?
Thank you again,
- Sean
~ Sean Lee
From: Doug Cutting <[EMAIL PROTECTED]>
Reply-To: [EMAIL PROTECTED]
To: [EMAIL PROTECTED]
Subject: Re: [Nutch-dev] keeping two different fetch list
Date: Tue, 11 May 2004 09:16:59 -0700
The point of this is to make it so that you can generate a fetchlist, start fetching it, then generate another before you've updated the database with output of the first. Otherwise the second fetchlist would contain the same pages as the first, as they'd still be due to be fetched. If you don't update the database with the output of the first fetch within a week, then they will be re-generated into a new fetchlist. But, if you update with the output of the first fetch within a week then the pages' next-fetch date will be reset to the value it would normally have (the current fetch date + the page's fetch interval).
In other words, this lets you concurrently fetch and update the database. (The seven-day constant should really be a config parameter, and it should probably never be less than the default fetch interval...)
Doug
Sean Lee wrote:
>in line 540 of java.net.nutch.tools.FetchListTool.java,
>
>it comments that:
>
>/ "//Modify the Page in the webdb so that its date is set forward a
>week. This way, we can have generate two consecutive different
>fetchlists without an intervening update"/
>
>My first Question is, what is "two consecutive different fetchlists"
>referring to exactly? Is it referring to the data structures in
>db/webdb/... and segments/200405xxxxxx/fetchlist/data?
>
>And, my second question is how exactly is the time stamp plays a
>role in achieving that non-intervening operations.
>
>Thank you,
>
>
>
> ~ Sean Lee
>------------------------------------------------------- This SF.Net
>email is sponsored by Sleepycat Software Learn developer strategies
>Cisco, Motorola, Ericsson & Lucent use to deliver higher performing
>products faster, at low TCO.
>http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
>_______________________________________________ Nutch-developers
>mailing list [EMAIL PROTECTED]
>https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers
-------------------------------------------------------
This SF.Net email is sponsored by Sleepycat Software
Learn developer strategies Cisco, Motorola, Ericsson & Lucent use to deliver
higher performing products faster, at low TCO.
http://www.sleepycat.com/telcomwpreg.php?From=osdnemail3
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers