Hey Louis:

I appreciate the vote of confidence in the parse tree approach :) ...and
agree that the numbers on rewriting are pretty encouraging. That said, a few
more points on which I'd like to get your thoughts:

1. The case for String based substitution applies to any circumstance in
which a given expansion spans the equivalent of multiple nodes. Because
GadgetHtmlNode treats text-type nodes as literal text, the expansion of a
message whose value is, for instance, "<b>foo!</b>" will in fact be rendered
"&lt;b&gt;foo!&lt;&gt;". That's not necessarily a bad thing, but it and
cases like it are almost certainly non-backward-compatible with _some_ use
cases out there.

2. We've discussed using this new content rewriting infrastructure as a way
of implementing UP, MSG, and other Substitutions, and potentially other
basic content manipulations. Past functional regressions (however
dismissable, per #1), satisfying these sorts of requests will be made
inherently more expensive, and by a considerable margin, due to parsing
cost. Further, they will fail when the parser fails. The alternative is to
keep these routines as-is. That's reasonable, but in general there remains a
set of requests for which parse tree manipulation simply isn't necessary,
which would be made much more expensive if a parse tree is the only way to
manipulate content through a rewriter.

3. By my count, the number of lines of code associated with maintaining both
String and Parse Tree representations of content is 62, in
MutableHtmlContent.java. True, there's some complexity inherent in two
representations being made available, and the corresponding burden to ensure
that all String-based rewriters are bundled together and all Parse-based
ones are as well. Just about every rewriter will use only one
representation, though, so the latter burden is the major issue. Given our
dogged insistence (rightly so) on performance/latency reduction, isn't that
a reasonable price to pay?

I suppose it's ironic that I'm championing the String representation given
the main thrust of my changes is to introduce a parse tree. If you both -
and anyone else in the community with an opinion! - still feel that parse
tree only is the way to go, I can accede... though to set expectations,
it'll take me a little while to get to making those mods. In the meantime,
I'd like to add *some* ability to retrieve <intermediary/mutable content
representation, currently MutableHtmlContent, later maybe just parse tree>
from an HttpResponse in such a way that it can (through collaboration with
HttpResponseBuilder) be shared between HttpResponse instances to avoid
re-parsing in a chain of Rewriters. Seem reasonable? Like Kevin said, we're
dealing with code, not stone here, and we can fiddle with the relevant APIs
until the balance feels right.

<just got Kevin's response as well - agreed on benchmarking this>

John

On Thu, Sep 4, 2008 at 9:44 PM, Louis Ryan <[EMAIL PROTECTED]> wrote:

> It seems that we have reached a point in the evolution of this feature
> where
> we should just commit to the parse-tree implementation. I think its fair to
> say that the performance was reasonably proved out by Johns results
> published earlier. I havent seen any objections to removing the
> lexing/string based rewriters so I think you can just go ahead and drop
> them
> at this point and make the interface simplifications Kevin is asking for.
> The case for supporting string based substitution because of the potential
> for prefs/msg substitution to cause unbalanced/unterminated tags is I think
> too weak to bother overburdening Shindig.
>
> On Thu, Sep 4, 2008 at 6:19 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
>
> > On Thu, Sep 4, 2008 at 6:12 PM, Kevin Brown <[EMAIL PROTECTED]> wrote:
> >
> > > On Thu, Sep 4, 2008 at 5:49 PM, John Hjelmstad <[EMAIL PROTECTED]>
> wrote:
> > >
> > > > I don't get it. In the Gadget case, it's always HTML content.
> > >
> > >
> > > Yes, and the gadget case is only a small fraction of things that get
> > > rewritten. Far more CSS and javascript is dealt with when rewriting is
> > > enabled.
> >
> >
> > True, but the Gadget case is all this CL deals with :)
> >
> >
> > >
> > >
> > >
> > > > That's why
> > > > MutableHtmlContent could IMO be a base class of Gadget - an
> > intermediary
> > > > representation of a request being processed, a mutable concept
> > > description
> > > > as I hear it - while being *potentially* a member of HttpResponse.
> > > > Rewriters
> > > > modify HttpResponse right now as it is, so it's effectively mutable
> > even
> > > if
> > > > the object isn't.
> > >
> > >
> > > No they don't, they produce a new HttpResponse. That's a very, very
> > > different thing from modifying an HttpResponse inline.
> >
> >
> > Also true, per my comments, which is why I'm open to a different
> interface
> > for the HttpResponse rewriter. So long as it meets the main requirement
> > that
> > the HttpResponse's parse tree can be shared by rewriters (thus not
> > re-parsed), I'm fine.
> >
> >
> > >
> > >
> > >
> > > > I don't really have religion on how that modification is
> > > > structured, to be honest with you -- the ContentRewriter interface
> > > dealing
> > > > with HttpResponse manipulation could be
> > > ContentRewriter.rewrite(HttpRequest
> > > > request[Context], HttpResponse originalResponse, MutableHtmlContent
> > > > mutableContent) for all I care. It just seems to make sense to me for
> > the
> > > > HttpResponse object itself to know whether or not its contents *can*
> be
> > > > represented as [Mutable] HTML due to its ownership of its own MIME
> type
> > > > metadata.
> > >
> > >
> > > And the calling wrapper can decide by reading the mime type as well --
> > all
> > > without having to pollute the HTTP pipeline with explicit knowledge of
> > > rewriting and introducing heavy coupling.
> >
> >
> > Yeah, I'm fine with that logic being elsewhere. Where would you like it?
> > BasicContentRewriterRegistry (for now)?
> >
> >
> > >
> > >
> > >
> > > >
> > > > Underlying much of this is the bet-hedging I'm doing by maintaining
> the
> > > > ability to modify either String or Parse Tree representation. I
> expect
> > > and
> > > > hope that in practice, only the parse tree is ever really used. But
> > > > consider:
> > >
> > >
> > > Why do we have to do "bet-hedging"? It's not like when you write code
> > it's
> > > being chiseled into stone tablets. If the parse tree winds up not
> > working,
> > > we get rid of it.
> >
> >
> > We have bet-hedging because it (String+parse tree) wasn't difficult to
> > implement, is already done and works, and could be useful. I enjoy
> letting
> > the marketplace, of sorts, decide on best use. If nobody uses the String
> > rep, we can just change the interface(s) to use a ParseTree.
> >
> >
> > >
> > >
> > >
> > > >
> > > >
> > > > <__MSG_nodeName__>content</__MSG_nodeName__>
> > > >
> > > > This can really only be manipulated as a String. Yes, this has to do
> > with
> > > > hangman substitution, which has a host of ugliness and other issues
> > with
> > > > it,
> > > > but at present A) it's our localization solution, and B) hangman
> > > > substitution applies to the entire contents of a document, not just
> > > Strings
> > > > in parsed context, per the spec (which says nothing about it).
> > >
> > >
> > > The spec doesn't say anything about it, and actually supporting this
> case
> > > is
> > > probably unnecessary if it complicates our code path.
> > >
> > >
> > > >
> > > >
> > > > I'm currently passing in Gadget to the Gadget rewriting API for
> another
> > > > reason, notably JsLibraries, which are another form of mutable Gadget
> > > > content. You and I have, unless I'm totally misunderstanding what
> we've
> > > > said, had discussions about JsLibraries really just being another
> type
> > of
> > > > node in the tree. That's great for a future update of the Gadget
> object
> > -
> > > > perhaps with the Js Library code listening to relevant nodes in the
> > parse
> > > > tree - but isn't available today. Today, the parse tree is only
> what's
> > > > derived from <Content> in the gadget.
> > >
> > >
> > > JsLibraries should go away entirely with a proper rewriter. The
> rewriter
> > > that is responsible for adding the libraries would simply inject script
> > > elements directly into the parse tree. Tacking JsLibraries onto Gadget
> > was
> > > an ugly kludge designed to solve the problem at hand. With a proper
> > > flexible
> > > rewriting system, we don't need that kludge any longer.
> >
> >
> > Agreed, per my comments. I'd like to make modifications to that effect
> > soon,
> > but we're just not there yet. Once we are, then passing in a ParseTree
> > really starts looking more compelling.
> >
> >
> > >
> > >
> > >
> > > >
> > > > On Thu, Sep 4, 2008 at 5:37 PM, Kevin Brown (JIRA) <[EMAIL PROTECTED]>
> > > > wrote:
> > > >
> > > > >
> > > > >    [
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SHINDIG-571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12628527#action_12628527
> > > > ]
> > > > >
> > > > > Kevin Brown commented on SHINDIG-571:
> > > > > -------------------------------------
> > > > >
> > > > > It's not HTML content. It's only HTML content a small percentage of
> > the
> > > > > time.
> > > > >
> > > > > Why can't rewriters just be implemented to take a mutable parse
> tree
> > > and
> > > > > the appropriate context (i.e. the original objects, probably mapped
> > to
> > > a
> > > > > common interface)?
> > > > >
> > > > > HttpResponse MUST NOT be made mutable -- including the addition of
> > > > mutable
> > > > > fields. This is critical.
> > > > >
> > > > > We just worked through a whole lot of stuff to make it immutable
> > > (again).
> > > > > This is incredibly important since HttpResponse is the primary
> object
> > > in
> > > > > caches. Once again, this is cramming behavior into HttpResponse
> that
> > > just
> > > > > doesn't belong there.
> > > > >
> > > > > The interface should be changed to just take both the original
> object
> > > and
> > > > > the mutable thing being operated on. Cramming it into some other
> > object
> > > > is
> > > > > nothing but a lot of bugs waiting to happen (i.e. Law of Demeter
> > > > violation).
> > > > >
> > > > > > Create shared intermediary representation of mutable HTML content
> > for
> > > > > Gadget and HttpResponse to use.
> > > > > >
> > > > >
> > > >
> > >
> >
> -----------------------------------------------------------------------------------------------------
> > > > > >
> > > > > >                 Key: SHINDIG-571
> > > > > >                 URL:
> > > https://issues.apache.org/jira/browse/SHINDIG-571
> > > > > >             Project: Shindig
> > > > > >          Issue Type: Sub-task
> > > > > >          Components: Common Components (Java)
> > > > > >            Reporter: John Hjelmstad
> > > > > >
> > > > > > While it still appears to make sense to have two Content Rewriter
> > > APIs,
> > > > > one for HTTP responses (makeRequest et al) and one for Gadget
> > > rewriting,
> > > > it
> > > > > remains true that most rewriters will treat each relatively the
> same:
> > > > obtain
> > > > > contents, manipulate, output.
> > > > > > As much code as possible should be shared for each rewriter API.
> > This
> > > > is
> > > > > facilitated by a shared notion of mutable HTML content - the type
> > > that's
> > > > > actually rewritten.
> > > > > > Thus I'd like to create a new class, MutableHtmlContent, which
> has
> > > the
> > > > > following methods:
> > > > > > MutableHtmlContent.setContent(String)
> > > > > > String MutableHtmlContent.getContent()
> > > > > > ParsedHtmlNode MutableHtmlContent.getParseTree()
> > > > > > ...in other words, the content manipulation methods from Gadget
> > > pulled
> > > > > out into a context-agnostic object that simply maintains a String
> and
> > > its
> > > > > parse-tree equivalent.
> > > > > > This code will be shared between Gadget and HttpResponse. At
> > present,
> > > > > this arrangement seems convenient:
> > > > > > 1. Gadget extends MutableHtmlContent. It could simply contain a
> > > > > MutableHtmlContent object as well, but doing so would simply have
> it
> > > > passing
> > > > > through same-named methods. This can be changed later if conditions
> > > > dictate.
> > > > > > 2. Augment HttpResponse with a new method:
> > > > > HttpResponse.getMutableHtmlContent(), which returns a
> > > MutableHtmlContent
> > > > if
> > > > > the MIME type is appropriate. A "force" option could be provided
> here
> > > as
> > > > > well.
> > > > > > Given these changes, rewriters can easily be implemented in terms
> > of
> > > > > helper methods that consume and manipulate MutableHtmlContent in
> the
> > > > > majority case.
> > > > >
> > > > > --
> > > > > This message is automatically generated by JIRA.
> > > > > -
> > > > > You can reply to this email to add a comment to the issue online.
> > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to