Got it, thanks for the clarification. Sounds like a good optimization to me... I'll impl it in a follow-up to just getting the base infrastructure in. John
On Wed, Sep 10, 2008 at 1:19 PM, Louis Ryan <[EMAIL PROTECTED]> wrote: > The overwhelming majority of the lookups will be for the rewritten versions > of cached Http fetches. Those are still cached as HTTP to maintain their > cache control semantics. If you stuff the MD5 in the originally cached HTTP > response you can use it to perform the lookup of the rewritten content. > > On Wed, Sep 10, 2008 at 1:14 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > > > Perhaps I'm missing something, but note that in order to share a cache > the > > data I'm storing are Strings, not HttpResponses (or Gadgets). > > As to hash method, MD5 it is. > > > > On Wed, Sep 10, 2008 at 11:18 AM, Kevin Brown <[EMAIL PROTECTED]> wrote: > > > > > On Wed, Sep 10, 2008 at 10:57 AM, Louis Ryan <[EMAIL PROTECTED]> wrote: > > > > > > > A little reading would indicate the MD5 is about 40% faster to > compute > > > than > > > > SHA1 so I would suggest we use it. It would also make sense to stuff > > the > > > > MD5 > > > > of the cached content into an HTTP header on the cached version of > the > > > > original content so we can avoid re-computing it. > > > > > > > > > Since HTTP response is immutable, we can do this at construction time > in > > > pretty much the same way that it's done for GadgetSpec (i.e. adding a > > > getChecksum method). > > > > > > > > > > > > > > > > > > On Tue, Sep 9, 2008 at 6:28 PM, Louis Ryan <[EMAIL PROTECTED]> wrote: > > > > > > > > > I would be a very wary of relying on VM implementation specifics > like > > > > > XXXRweriter.class.hashCode(). You certainly don't want the same > class > > > > > generating different coes on different instances in a cluster where > > the > > > > > cache is shared. Also dont use serialVersionUID as that is only > > > intended > > > > for > > > > > serialization compatability. Reading the class as an InputStream > from > > > the > > > > > ClassLoader and then computing a hash might work but seems more > > trouble > > > > than > > > > > its worth. > > > > > > > > > > In general I think a a manually maintained version no. might > actually > > > be > > > > > your best bet. > > > > > > > > > > > > > > > On Tue, Sep 9, 2008 at 5:29 PM, John Hjelmstad <[EMAIL PROTECTED]> > > > wrote: > > > > > > > > > >> Back to this sub-thread... > > > > >> Adding a getVersion() parameter seems error prone and a little > > > > burdensome. > > > > >> In theory the version needs to change when any modification in > > > potential > > > > >> output for a given input is made, ie. not just optimization. > That's > > > not > > > > >> always the easiest thing to analyze and get right. Meanwhile, > > > > invalidating > > > > >> a > > > > >> bunch of cache keys (by registering a new rewriter impl) isn't > > > > especially > > > > >> costly: if the cache is useful at all, it will be hit often, ie. > > > > refreshed > > > > >> quickly. > > > > >> > > > > >> --John > > > > >> > > > > >> On Tue, Sep 9, 2008 at 1:50 PM, Louis Ryan <[EMAIL PROTECTED]> > > wrote: > > > > >> > > > > >> > The versioning should be more explicit than that I think. Maybe > > add > > > a > > > > >> > getVersion function to the rewriter interface so they can manage > > > their > > > > >> own > > > > >> > changes > > > > >> > > > > > >> > On Tue, Sep 9, 2008 at 12:39 PM, John Hjelmstad < > [EMAIL PROTECTED] > > > > > > > >> wrote: > > > > >> > > > > > >> > > Interesting suggestion. I can include the rewriter class names > > and > > > > >> their > > > > >> > > class hash codes or some other such versioning construct. > > > > >> > > > > > > >> > > On Tue, Sep 9, 2008 at 12:32 PM, Ben Laurie <[EMAIL PROTECTED]> > > > > wrote: > > > > >> > > > > > > >> > > > On Tue, Sep 9, 2008 at 8:17 PM, John Hjelmstad < > > > [EMAIL PROTECTED]> > > > > >> > wrote: > > > > >> > > > > Excellent, agreed. CLs forthcoming. > > > > >> > > > > > > > >> > > > Can you include the version numbers of the rewriters in the > > > cache > > > > >> key? > > > > >> > > > > > > > >> > > > > > > > > >> > > > > On Tue, Sep 9, 2008 at 12:14 PM, Kevin Brown < > > [EMAIL PROTECTED] > > > > > > > > >> > wrote: > > > > >> > > > > > > > > >> > > > >> #2 is the only really viable option. If we have to put > > > caching > > > > >> logic > > > > >> > > in > > > > >> > > > 10 > > > > >> > > > >> different places we'll screw it up 9 different times :). > > > > >> > > > >> > > > > >> > > > >> On Tue, Sep 9, 2008 at 12:11 PM, John Hjelmstad < > > > > >> [EMAIL PROTECTED]> > > > > >> > > > wrote: > > > > >> > > > >> > > > > >> > > > >> > As discussed on a few threads and tracked in JIRA issue > ( > > > > >> > > > >> > http://issues.apache.org/jira/browse/SHINDIG-579), we > > need > > > > to > > > > >> > move > > > > >> > > > >> > rewriting > > > > >> > > > >> > logic out of AbstractHttpCache. Yet we should maintain > > > > >> rewritten > > > > >> > > > content > > > > >> > > > >> > caching capability. The question is where to put it. > > > > >> > > > >> > I see two options, at a high level: > > > > >> > > > >> > 1. In code that calls > > > > >> > > > >> ContentRewriterRegistry.rewrite(HttpResponse|Gadget). > > > > >> > > > >> > Eg. MakeRequestHandler, ProxyHandler, > ViewContentFetcher, > > > > >> > > > GadgetServer, > > > > >> > > > >> and > > > > >> > > > >> > the near-future Renderer and Preloader. This allows > > > > >> finer-grained > > > > >> > > > control > > > > >> > > > >> > over caching behavior in context, at the cost of > > > distributing > > > > >> > > caching > > > > >> > > > >> logic > > > > >> > > > >> > in various places. > > > > >> > > > >> > 2. In > > ContentRewriterRegistry.rewrite(HttpResponse|Gadget) > > > > >> itself, > > > > >> > > if > > > > >> > > > so > > > > >> > > > >> > chosen. Caching logic can be consolidated in > > > > >> > > > >> > CachingContentRewriterRegistry, > > > > >> > > > >> > for instance (which will no longer subclass > > > > >> > > > CachingWebRetrievalFactory), > > > > >> > > > >> > and > > > > >> > > > >> > be considered an optimization to rewriting. > > > > >> > > > >> > > > > > >> > > > >> > I'm inclined to go #2. Rewriters themselves can be > > > augmented > > > > >> with > > > > >> > > > caching > > > > >> > > > >> > hints if necessary, and be assumed deterministic for a > > > given > > > > >> cache > > > > >> > > key > > > > >> > > > in > > > > >> > > > >> > the meantime. Consolidating rewriting logic makes it > > easier > > > > to > > > > >> > share > > > > >> > > > the > > > > >> > > > >> > cache itself. > > > > >> > > > >> > > > > > >> > > > >> > Still, I might be missing situations in which > additional > > > > >> context > > > > >> > > > inherent > > > > >> > > > >> > to > > > > >> > > > >> > the calling context is needed to make a caching > decision. > > > > >> > > > >> > > > > > >> > > > >> > --John > > > > >> > > > >> > > > > > >> > > > >> > > > > >> > > > > > > > > >> > > > > > > > >> > > > > > > >> > > > > > >> > > > > > > > > > > > > > > > > > > > >

