Hi

Rather than a wide crawl of the web keeping track of the current state of sites 
(as I understand Nutch is currently optimized for) I am interested in keeping 
copies of a more modest number of sites over time as they change. In other 
words keeping copies of both the old webpages and the new pages as they change. 
My overly optimistic wishful thinking is that I could get close enough to this 
by simply adding the signature (TextProfileSignature in particular) to the 
current id key. Any thoughts as to if this is feasible and if so where in the 
codebase I should start looking in order to do that? I am aware Heritrix 
specializes in archiving but I would really like to stick with Nutch if 
possible unless it absolutely doesn't make sense.

Thanks

James

Reply via email to