Now that I'm actually getting into it...
It seems that what we want is for the hash to be generated only before
HttpResponse is cached. We shouldn't waste time generating MD5s every time
we generate an HttpResponse object.

That suggests:
HttpResponse.getChecksum() {
  if (checksum != null) {
    checksum = HashUtil.checksum(getResponseAsString());
  }
  return checksum;
}

...which also requires that caching code knows to cache the checksum as
well. So I see Louis' point in storing the MD5 in a header. The helper
method is a nice convenience though.

In either case we need to generate the MD5 in the AbstractHttpCache layer,
which is sort of ugly (ie. calling getChecksum() just to make sure it's
stored as a side-effect), but that seems unavoidable.

--John

On Wed, Sep 10, 2008 at 2:17 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote:

> Yep, exactly, I like that way of structuring it.
>
>
> On Wed, Sep 10, 2008 at 1:32 PM, Kevin Brown <[EMAIL PROTECTED]> wrote:
>
>> I was implying that you could keep the checksum on the HttpResponse
>> object,
>> yielding:
>>
>> cache.addEntry(response.getChecksum(), rewrittenContent);
>>
>> On Wed, Sep 10, 2008 at 1:14 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
>>
>> > Perhaps I'm missing something, but note that in order to share a cache
>> the
>> > data I'm storing are Strings, not HttpResponses (or Gadgets).
>> > As to hash method, MD5 it is.
>> >
>> > On Wed, Sep 10, 2008 at 11:18 AM, Kevin Brown <[EMAIL PROTECTED]> wrote:
>> >
>> > > On Wed, Sep 10, 2008 at 10:57 AM, Louis Ryan <[EMAIL PROTECTED]>
>> wrote:
>> > >
>> > > > A little reading would indicate the MD5 is about 40% faster to
>> compute
>> > > than
>> > > > SHA1 so I would suggest we use it. It would also make sense to stuff
>> > the
>> > > > MD5
>> > > > of the cached content into an HTTP header on the cached version of
>> the
>> > > > original content so we can avoid re-computing it.
>> > >
>> > >
>> > > Since HTTP response is immutable, we can do this at construction time
>> in
>> > > pretty much the same way that it's done for GadgetSpec (i.e. adding a
>> > > getChecksum method).
>> > >
>> > >
>> > > >
>> > > >
>> > > > On Tue, Sep 9, 2008 at 6:28 PM, Louis Ryan <[EMAIL PROTECTED]>
>> wrote:
>> > > >
>> > > > > I would be a very wary of relying on VM implementation specifics
>> like
>> > > > > XXXRweriter.class.hashCode(). You certainly don't want the same
>> class
>> > > > > generating different coes on different instances in a cluster
>> where
>> > the
>> > > > > cache is shared. Also dont use serialVersionUID as that is only
>> > > intended
>> > > > for
>> > > > > serialization compatability. Reading the class as an InputStream
>> from
>> > > the
>> > > > > ClassLoader and then computing a hash might work but seems more
>> > trouble
>> > > > than
>> > > > > its worth.
>> > > > >
>> > > > > In general I think a a manually maintained version no. might
>> actually
>> > > be
>> > > > > your best bet.
>> > > > >
>> > > > >
>> > > > > On Tue, Sep 9, 2008 at 5:29 PM, John Hjelmstad <[EMAIL PROTECTED]>
>> > > wrote:
>> > > > >
>> > > > >> Back to this sub-thread...
>> > > > >> Adding a getVersion() parameter seems error prone and a little
>> > > > burdensome.
>> > > > >> In theory the version needs to change when any modification in
>> > > potential
>> > > > >> output for a given input is made, ie. not just optimization.
>> That's
>> > > not
>> > > > >> always the easiest thing to analyze and get right. Meanwhile,
>> > > > invalidating
>> > > > >> a
>> > > > >> bunch of cache keys (by registering a new rewriter impl) isn't
>> > > > especially
>> > > > >> costly: if the cache is useful at all, it will be hit often, ie.
>> > > > refreshed
>> > > > >> quickly.
>> > > > >>
>> > > > >> --John
>> > > > >>
>> > > > >> On Tue, Sep 9, 2008 at 1:50 PM, Louis Ryan <[EMAIL PROTECTED]>
>> > wrote:
>> > > > >>
>> > > > >> > The versioning should be more explicit than that I think. Maybe
>> > add
>> > > a
>> > > > >> > getVersion function to the rewriter interface so they can
>> manage
>> > > their
>> > > > >> own
>> > > > >> > changes
>> > > > >> >
>> > > > >> > On Tue, Sep 9, 2008 at 12:39 PM, John Hjelmstad <
>> [EMAIL PROTECTED]
>> > >
>> > > > >> wrote:
>> > > > >> >
>> > > > >> > > Interesting suggestion. I can include the rewriter class
>> names
>> > and
>> > > > >> their
>> > > > >> > > class hash codes or some other such versioning construct.
>> > > > >> > >
>> > > > >> > > On Tue, Sep 9, 2008 at 12:32 PM, Ben Laurie <[EMAIL PROTECTED]
>> >
>> > > > wrote:
>> > > > >> > >
>> > > > >> > > > On Tue, Sep 9, 2008 at 8:17 PM, John Hjelmstad <
>> > > [EMAIL PROTECTED]>
>> > > > >> > wrote:
>> > > > >> > > > > Excellent, agreed. CLs forthcoming.
>> > > > >> > > >
>> > > > >> > > > Can you include the version numbers of the rewriters in the
>> > > cache
>> > > > >> key?
>> > > > >> > > >
>> > > > >> > > > >
>> > > > >> > > > > On Tue, Sep 9, 2008 at 12:14 PM, Kevin Brown <
>> > [EMAIL PROTECTED]
>> > > >
>> > > > >> > wrote:
>> > > > >> > > > >
>> > > > >> > > > >> #2 is the only really viable option. If we have to put
>> > > caching
>> > > > >> logic
>> > > > >> > > in
>> > > > >> > > > 10
>> > > > >> > > > >> different places we'll screw it up 9 different times :).
>> > > > >> > > > >>
>> > > > >> > > > >> On Tue, Sep 9, 2008 at 12:11 PM, John Hjelmstad <
>> > > > >> [EMAIL PROTECTED]>
>> > > > >> > > > wrote:
>> > > > >> > > > >>
>> > > > >> > > > >> > As discussed on a few threads and tracked in JIRA
>> issue (
>> > > > >> > > > >> > http://issues.apache.org/jira/browse/SHINDIG-579), we
>> > need
>> > > > to
>> > > > >> > move
>> > > > >> > > > >> > rewriting
>> > > > >> > > > >> > logic out of AbstractHttpCache. Yet we should maintain
>> > > > >> rewritten
>> > > > >> > > > content
>> > > > >> > > > >> > caching capability. The question is where to put it.
>> > > > >> > > > >> > I see two options, at a high level:
>> > > > >> > > > >> > 1. In code that calls
>> > > > >> > > > >> ContentRewriterRegistry.rewrite(HttpResponse|Gadget).
>> > > > >> > > > >> > Eg. MakeRequestHandler, ProxyHandler,
>> ViewContentFetcher,
>> > > > >> > > > GadgetServer,
>> > > > >> > > > >> and
>> > > > >> > > > >> > the near-future Renderer and Preloader. This allows
>> > > > >> finer-grained
>> > > > >> > > > control
>> > > > >> > > > >> > over caching behavior in context, at the cost of
>> > > distributing
>> > > > >> > > caching
>> > > > >> > > > >> logic
>> > > > >> > > > >> > in various places.
>> > > > >> > > > >> > 2. In
>> > ContentRewriterRegistry.rewrite(HttpResponse|Gadget)
>> > > > >> itself,
>> > > > >> > > if
>> > > > >> > > > so
>> > > > >> > > > >> > chosen. Caching logic can be consolidated in
>> > > > >> > > > >> > CachingContentRewriterRegistry,
>> > > > >> > > > >> > for instance (which will no longer subclass
>> > > > >> > > > CachingWebRetrievalFactory),
>> > > > >> > > > >> > and
>> > > > >> > > > >> > be considered an optimization to rewriting.
>> > > > >> > > > >> >
>> > > > >> > > > >> > I'm inclined to go #2. Rewriters themselves can be
>> > > augmented
>> > > > >> with
>> > > > >> > > > caching
>> > > > >> > > > >> > hints if necessary, and be assumed deterministic for a
>> > > given
>> > > > >> cache
>> > > > >> > > key
>> > > > >> > > > in
>> > > > >> > > > >> > the meantime. Consolidating rewriting logic makes it
>> > easier
>> > > > to
>> > > > >> > share
>> > > > >> > > > the
>> > > > >> > > > >> > cache itself.
>> > > > >> > > > >> >
>> > > > >> > > > >> > Still, I might be missing situations in which
>> additional
>> > > > >> context
>> > > > >> > > > inherent
>> > > > >> > > > >> > to
>> > > > >> > > > >> > the calling context is needed to make a caching
>> decision.
>> > > > >> > > > >> >
>> > > > >> > > > >> > --John
>> > > > >> > > > >> >
>> > > > >> > > > >>
>> > > > >> > > > >
>> > > > >> > > >
>> > > > >> > >
>> > > > >> >
>> > > > >>
>> > > > >
>> > > > >
>> > > >
>> > >
>> >
>>
>
>

Reply via email to