Andrzej Bialecki wrote:
For efficiency reasons, most of this information is stored and passed to
processing jobs inside instances of CrawlDatum - for the key step of DB
update any other parts of segments (such as Content, ParseData or
ParseText) are not used, which prevents easy access to other
Hi,
I've been working on a set of patches to implement this functionality
for the mapred branch.
I have a workable solution now, but before I decide to commit it I'd
like to solicit some comments. Please see the latest patch available
from JIRA NUTCH-61.
Based on the past discussions, I de