Using full body contents is probably unavoidable. This may be - depending on implementation - a distributed cache, so we should code for that. I'm inclined to agree with Louis that MD5/SHA1 seems fine... HashUtils (in common) provides the former, while DigestUtils in apache.commons.codec proffers sha (and another md5 impl, it seems).
--John On Tue, Sep 9, 2008 at 4:45 PM, Kevin Brown <[EMAIL PROTECTED]> wrote: > On Tue, Sep 9, 2008 at 4:38 PM, Brian Eaton <[EMAIL PROTECTED]> wrote: > > > On Tue, Sep 9, 2008 at 4:25 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > > > I briefly considered String hashCode, but quickly recognized that was a > > bad > > > idea. MD5 of contents sounds reasonable. Brian, thoughts? > > > > I suspect using the entire input body contents is out of the question, > > though that was my initial thought. > > > > Don't use MD5. Nobody knows how to attack it for this kind of > > application, yet, but a lot of progress has been made. SHA1 is > > probably OK. SHA-256 would be great, HMAC-SHA1 would be great, except > > then you have to worry about keying, which is a pain. This cache is > > potentially shared across multiple servers, right? > > > > If it's a single server cache, HMAC-SHA1 with a random key. > > > > The cache key generated by the HTTP content fetchers might be useful > > for this as well, assuming you can get ahold of it somehow. > > > There's a utility checked in to produce a base32 encoded SHA1 checked into > common that can be used for this. >

