Re: Content Rewriter Modularization: Design/Change

Louis Ryan Mon, 11 Aug 2008 12:08:24 -0700

John,

>From a practicality standpoint I'm a little nervous about this plan to make
RPCs calls out of a Java process to a native process to fetch a parse tree
for transformations that have to occur realtime. I don't think the
motivating factor here is to accept all inputs that browsers can. Gadget
developers will tailor their markup to the platform as they have done
already. I would greatly prefer us to pick one 'good' parser and stick with
it for all the manageability and consumability benefits that come with that
decision. Perhaps Im missing something here?


-Louis

On Mon, Aug 11, 2008 at 11:59 AM, John Hjelmstad <[EMAIL PROTECTED]> wrote:

> On Fri, Aug 8, 2008 at 6:10 AM, Ben Laurie <[EMAIL PROTECTED]> wrote:
>
> > [+google-caja-discuss]
> >
> > On Thu, Aug 7, 2008 at 9:27 PM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
> > > On Thu, Aug 7, 2008 at 3:20 AM, Ben Laurie <[EMAIL PROTECTED]> wrote:
> > >
> > >> On Wed, Aug 6, 2008 at 11:34 PM, John Hjelmstad <[EMAIL PROTECTED]>
> > wrote:
> > >> > This proposal effectively enables the renderer to become a
> multi-pass
> > >> > compiler for gadget content (essentially, arbitrary web content).
> Such
> > a
> > >> > compiler can provide several benefits: static optimization of gadget
> > >> content
> > >> > (auto-proxying of images, whitespace/comment removal, consolidation
> of
> > >> CSS
> > >> > blocks), security benefits (caja et al), new functionality
> (annotation
> > of
> > >> > content for stats, document analysis, container-specific features),
> > etc.
> > >> To
> > >> > my knowledge no such infrastructure exists today (with the possible
> > >> > exception of Caja itself, which I'd like to dovetail with this
> work).
> > >>
> > >> Caja clearly provides a large chunk of the code you'd need for this.
> > >> I'd like to hear how we'd manage to avoid duplication between the two
> > >> projects.
> > >>
> > >> A generalised framework for manipulating content sounds like a great
> > >> idea, but probably should not live in either of the two projects (Caja
> > >> and Shindig) but rather should be shared by both of them, I suspect.
> > >
> > >
> > > I agree on both counts. As I mentioned, the piece of this idea that I
> > expect
> > > to change the most is the parse tree, and Caja's .parser.html and
> > > .parser.css packages contain much of what I've thrown in here as a
> base.
> > >
> > > My key requirements are:
> > > * Lightweight framework.
> > > * Parser modularity, mostly for HTML parsers (to re-use the good work
> > done
> > > by WebKit or Gecko.. CSS/JS can come direct from Caja I'd bet)
> > > * Automatic maintenance of DOM<->String conversion.
> > > * Easy to manipulate structure.
> >
> > I'm not sure what the value of parser modularity is? If the resulting
> > tree is different, then that's a problem for people processing the
> > tree. And if it is not, then why do we care?
>
>
> IMO the value of parser modularity is that the lenient parsers native to
> browsers can be used in place of those that might not accept all inputs.
> One
> could (and I'd like to) adapt WebKit or Gecko's parsing code into a server
> that runs parallel to Shindig and provides a "local RPC" service for
> parsing
> semi-structured HTML. The resulting tree for WebKit's parser might be
> different than that for an XHTML parser, Gecko's parser, etc, but if the
> algorithm implemented atop it is rule-based rather than strict-structure
> based that should be fine, no?
>
>
> >
> >
> > >
> > > I'd love to see both projects share the same base syntax tree
> > > representations. I considered .parser.html(.DomTree) and .parser.css
> for
> > > these, but at the moment these appeared to be a little more tied to
> > Caja's
> > > lexer/parser implementation than I preferred (though I admit
> > > AbstractParseTreeNode contains most of what's needed).
> > >
> > > To be sure, I don't see this as an end-all-be-all transformation system
> > in
> > > any way. I'd just like to put *something* reasonable in place that we
> can
> > > play with, provide some benefit, and enhance into a truly sophisticated
> > > vision of document rewriting.
> > >
> > >
> > >>
> > >>
> > >> >  c. Add Gadget.getParsedContent().
> > >> >    i. Returns a mutable GadgetContentParseTree used to manipulate
> > Gadget
> > >> > Contents.
> > >> >    ii. Mutable tree calls back to the Gadget object indicating when
> > any
> > >> > change is made, and emits an error if setContent() has been called
> in
> > the
> > >> > interim.
> > >>
> > >> In Caja we have been moving towards immutable trees...
> > >
> > >
> > > Interested to hear more about this. The whole idea is for the gadget's
> > tree
> > > representation to be modifiable. Doing that with immutable trees to me
> > > suggests that a rewriter would have to create a completely new tree and
> > set
> > > it as a representation of new content. That's convenient as far as the
> > > Gadget's maintenance of String<->Tree representations is concerned...
> but
> > > seems pretty heavyweight for many types of edits: in-situ modifications
> > of
> > > text, content reordering, etc. That's particularly so in a
> > single-threaded
> > > (viz rewriting) environment.
> >
> > Never having been entirely sold on the concept, I'll let those on the
> > Caja team who advocate immutability explain why.
> >
>

Re: Content Rewriter Modularization: Design/Change

Reply via email to