hi there, I dumped the contents in segment/fetchlist and segment/fetcher;
My curious question is that: why MD5 signature of the page content doesn't save in fetchlist? In my mind, I think it will save CPU time if we see a page unchanged --- coz we can skip the parsing process; From my view, if we have MD5 in fetchlist, we can do it directly in memory. If we have MD5 in fetcher, we need to search it in local file in order to do compare with the new fetched page content MD5. Did I miss some important points or my dumping is wrong? thanks, Michael Ji ----------------fetchlist-------------------- fetch: true page: Version: 4 URL: http://www.sina.com/ ID: d6a83e9c17e05d5602709a63c241bf68 Next fetch: Sun Aug 21 20:15:06 CDT 2005 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 0 Score: 1.0 NextScore: 1.0 anchors: 0 ----------------fetcher-------------------- fetch: true page: Version: 4 URL: http://www.sina.com/ ID: d6a83e9c17e05d5602709a63c241bf68 Next fetch: Sun Aug 21 20:15:06 CDT 2005 Retries since fetch: 0 Retry interval: 30 days Num outlinks: 0 Score: 1.0 NextScore: 1.0 anchors: 0 Fetch Result: MD5Hash: 56eae3c2556cb10a00e7346738dcb318 ProtocolStatus: success(1), lastModified=0 FetchDate: Sun Aug 14 20:15:13 CDT 2005 __________________________________________________ Do You Yahoo!? Tired of spam? Yahoo! Mail has the best spam protection around http://mail.yahoo.com ------------------------------------------------------- SF.Net email is Sponsored by the Better Software Conference & EXPO September 19-22, 2005 * San Francisco, CA * Development Lifecycle Practices Agile & Plan-Driven Development * Managing Projects & Teams * Testing & QA Security * Process Improvement & Measurement * http://www.sqe.com/bsce5sf _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
