Re: Adaptive fetch interval & unmodified content detection, episode II

2006-01-06 Thread Doug Cutting
Andrzej Bialecki wrote: For efficiency reasons, most of this information is stored and passed to processing jobs inside instances of CrawlDatum - for the key step of DB update any other parts of segments (such as Content, ParseData or ParseText) are not used, which prevents easy access to other

Adaptive fetch interval & unmodified content detection, episode II

2005-12-30 Thread Andrzej Bialecki
Hi, I've been working on a set of patches to implement this functionality for the mapred branch. I have a workable solution now, but before I decide to commit it I'd like to solicit some comments. Please see the latest patch available from JIRA NUTCH-61. Based on the past discussions, I de