pages: <url, <status, contentHash, lastFetchDate, numFailures> >
Is this list of storable fields extendable by plugins?
Sure, I don't see why not. We base classes that supply those fields that the generic fetchlist generation and fetcher code require. But there's no reason those couldn't be extended for particular applications. One caution is that making this structure bigger will slow step (3), page db update.
For example it might be intersting to monitor changes on websites and prefer more up to date pages in ranking.
So you'd add a lastChangedDate?
In this case for example I would add fields about the content to compute changes when fetching the page again. For the calculated result I also would store a value about the amount of changes per time.
Would you require more than a hash of the content? Again, storing large data in this file will substantially slow db update.
Doug
------------------------------------------------------- SF email is sponsored by - The IT Product Guide Read honest & candid reviews on hundreds of IT Products from real users. Discover which products truly live up to the hype. Start reading now. http://ads.osdn.com/?ad_id=6595&alloc_id=14396&op=click _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
