Are you pushing it into a search index of some sort?

As I mostly push things into Solr I would modify the key to take signature into 
account.



On 9 Oct 2012, at 11:17, <[email protected]> wrote:

> Hi
> 
> Rather than a wide crawl of the web keeping track of the current state of 
> sites (as I understand Nutch is currently optimized for) I am interested in 
> keeping copies of a more modest number of sites over time as they change. In 
> other words keeping copies of both the old webpages and the new pages as they 
> change. My overly optimistic wishful thinking is that I could get close 
> enough to this by simply adding the signature (TextProfileSignature in 
> particular) to the current id key. Any thoughts as to if this is feasible and 
> if so where in the codebase I should start looking in order to do that? I am 
> aware Heritrix specializes in archiving but I would really like to stick with 
> Nutch if possible unless it absolutely doesn't make sense.
> 
> Thanks
> 
> James

Reply via email to