Hi Rather than a wide crawl of the web keeping track of the current state of sites (as I understand Nutch is currently optimized for) I am interested in keeping copies of a more modest number of sites over time as they change. In other words keeping copies of both the old webpages and the new pages as they change. My overly optimistic wishful thinking is that I could get close enough to this by simply adding the signature (TextProfileSignature in particular) to the current id key. Any thoughts as to if this is feasible and if so where in the codebase I should start looking in order to do that? I am aware Heritrix specializes in archiving but I would really like to stick with Nutch if possible unless it absolutely doesn't make sense.
Thanks James

