On Wed, Aug 27, 2008 at 2:12 AM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > Agreed with these conclusions. There are a number of avenues open to us for > rewriter optimization, and per stats, working with the Caja team on an > optimized, ideally purpose-built-for-Shindig parser is tops on the list.
Caja also has optimisation on the road map. "Working" comes first, though :-) Curious what "purpose-built-for-Shindig" would entail? > Re: making ParseTreeDefaultContentRewriter the default... thanks! I plan to > do this, but FYI I'll be making the actual change only after cleaning up > plumbing a bit: making GadgetSpec immutable, then cleaning up the rewriter > interface to consume a (mutable) Gadget, which maintains its own parse tree > (per the original plan). This in turn will allow > ParseTreeDefaultContentRewriter to be split into its constituent parts (link > rewriter, js consolidator, style rewriter), and more importantly allow other > rewriters to modularly take advantage of the parse tree. Until then, the > parse tree is hidden from other rewriters. > Expect these changes soon, along with resuscitation of the discussion on how > best to handle caching of rewritten content. > --John > > On Mon, Aug 25, 2008 at 5:21 PM, Kevin Brown <[EMAIL PROTECTED]> wrote: >> >> I think any arguments one way or another contingent on caching should be >> ignored, because caching isn't always going to be possible (many >> makeRequest >> calls, proxied content, etc.) >> >> Instead, we should just make sure that the total time spent rewriting a >> given piece of content is under some target -- say 50-100ms. We can >> optimize >> aggressively to meet this goal, regardless of which approach we take. From >> the initial results, I think we can simply work with the caja team to >> improve the performance of the parser to bring it into that range. Having >> a >> parse tree is more useful than not having one, so we should go ahead and >> use >> the parser instead of the lexer and then just focus on making it better. >> >> On Mon, Aug 25, 2008 at 4:44 PM, Louis Ryan <[EMAIL PROTECTED]> wrote: >> >> > These numbers seem pretty solid so I believe its fine to switch the >> > general >> > rewriter over to this new code and ditch the lexer based stuff. Nice >> > work. >> > >> > On Mon, Aug 25, 2008 at 2:27 PM, John Hjelmstad <[EMAIL PROTECTED]> >> > wrote: >> > >> > > I've implemented a parse-tree based ContentRewriter using the existing >> > > plumbing (same caching semantics et al), as >> > ParseTreeDefaultContentRewriter >> > > and ParseTreeHtmlRewriter, respectively. The latter contains >> > > essentially >> > > all >> > > rewriting functionality from the previous HtmlTagTransformer classes. >> > > The >> > > parse-tree based rewriter is now functionally equivalent to the >> > > previous >> > > rewriter. The new classes use a Caja-based HTML parser under the hood. >> > > This proves out the functional viability of a tree-based rewriter, but >> > > performance also needs to be assessed. >> > > >> > > I've gone ahead and profiled the comparative performance of each >> > rewriter, >> > > "Lexer" based and "ParseTree" based. To no surprise, "Lexer" wins out >> > every >> > > time essentially by definition, since obviously Caja's parser uses its >> > own >> > > lexer under the hood. >> > > >> > > Summary: >> > > * The performance of each is fundamentally O(n), though... >> > > * For any given input size, Lexer-based rewriting averages between 2.5 >> > > - >> > > 3.5x faster than ParseTree-based (ie. c =~ 3.5 at worst). >> > > * By far, the majority of time involved in ParseTree-based >> > > optimization >> > is >> > > initial parsing: 75% of all processing. >> > > >> > > Details: >> > > 1. I wrote a simple rewriter/parser profiler which rewrites (the >> > > sample >> > > rewriter gadget's content * X repetitions) N times, recording the >> > resulting >> > > run time. The run time of parse-based rewriting degraded as N >> > > increased, >> > in >> > > all likelihood due to the additional cost of object management >> > (lexer-based >> > > rewriting involves few intermediate objects). Given that the results >> > > of >> > > rewriting will be variously cached, it's very unlikely that rewriting >> > will >> > > happen in immediate succession hundreds or thousands of times. As >> > > such, I >> > > fixed N = 1 to re-run the tests in relative isolation from one >> > > another. >> > > Results from a given run: >> > > >> > > LEX-BASED*100 rewriter, 1 runs in 177047 microsecs [177.04704] >> > > millis/run >> > > >> > > PARSE-BASED*100 rewriter, 1 runs in 609136 microsecs [609.136128] >> > > millis/run >> > > >> > > Parse/lex ratio: 3.4405327398939263 >> > > >> > > LEX-BASED*50 rewriter, 1 runs in 43936 microsecs [43.936] millis/run >> > > >> > > PARSE-BASED*50 rewriter, 1 runs in 148980 microsecs [148.979968] >> > millis/run >> > > >> > > Parse/lex ratio: 3.3908412235979606 >> > > >> > > LEX-BASED*10 rewriter, 1 runs in 3093 microsecs [3.092992] millis/run >> > > >> > > PARSE-BASED*10 rewriter, 1 runs in 11020 microsecs [11.020032] >> > > millis/run >> > > >> > > Parse/lex ratio: 3.5628839314581313 >> > > >> > > LEX-BASED*1 rewriter, 1 runs in 600 microsecs [0.600064] millis/run >> > > >> > > PARSE-BASED*1 rewriter, 1 runs in 1819 microsecs [1.819136] millis/run >> > > >> > > Parse/lex ratio: 3.0316666666666667 >> > > >> > > >> > > 2. Drilling down, I added simple operation profiling to each component >> > > of >> > > parse-tree rewriting: original parse (CajaHtmlParser); building >> > > mutable >> > > tree >> > > nodes; rewriting links; concatenating JS nodes; rewriting style >> > > blocks; >> > > rendering parse tree. I then reran the same tests. >> > > >> > > Results from subsequent run: >> > > >> > > LEX-BASED*100 rewriter, 1 runs in 165321 microsecs [165.32096] >> > > millis/run >> > > >> > > PARSE-BASED*100 rewriter, 1 runs in 646884 microsecs [646.88384] >> > millis/run >> > > >> > > Parse/lex ratio: 3.912896728183352 >> > > >> > > [PARSE OPS] >> > > >> > > Op[style-rewrite] min:25.419ms, max:25.419ms, avg:25.419ms >> > > >> > > Op[render] min:36.851ms, max:36.851ms, avg:36.851ms >> > > >> > > Op[js-rewrite] min:53.983ms, max:53.983ms, avg:53.983ms >> > > >> > > Op[link-rewrite] min:31.136ms, max:31.136ms, avg:31.136ms >> > > >> > > Op[build-nodes] min:32.929ms, max:32.929ms, avg:32.929ms >> > > >> > > Op[parse] min:464.211ms, max:464.211ms, avg:464.211ms >> > > >> > > >> > > LEX-BASED*50 rewriter, 1 runs in 30684 microsecs [30.683904] >> > > millis/run >> > > >> > > PARSE-BASED*50 rewriter, 1 runs in 161132 microsecs [161.132032] >> > millis/run >> > > >> > > Parse/lex ratio: 5.251336201277539 >> > > >> > > [PARSE OPS] >> > > >> > > Op[style-rewrite] min:8.581ms, max:8.581ms, avg:8.581ms >> > > >> > > Op[render] min:5.184ms, max:5.184ms, avg:5.184ms >> > > >> > > Op[js-rewrite] min:11.606ms, max:11.606ms, avg:11.606ms >> > > >> > > Op[link-rewrite] min:7.533ms, max:7.533ms, avg:7.533ms >> > > >> > > Op[build-nodes] min:3.41ms, max:3.41ms, avg:3.41ms >> > > >> > > Op[parse] min:121.367ms, max:121.367ms, avg:121.367ms >> > > >> > > >> > > LEX-BASED*10 rewriter, 1 runs in 3371 microsecs [3.371008] millis/run >> > > >> > > PARSE-BASED*10 rewriter, 1 runs in 10336 microsecs [10.336] millis/run >> > > >> > > Parse/lex ratio: 3.066152477009789 >> > > >> > > [PARSE OPS] >> > > >> > > Op[style-rewrite] min:0.563ms, max:0.563ms, avg:0.563ms >> > > >> > > Op[render] min:0.678ms, max:0.678ms, avg:0.678ms >> > > >> > > Op[js-rewrite] min:1.374ms, max:1.374ms, avg:1.374ms >> > > >> > > Op[link-rewrite] min:0.718ms, max:0.718ms, avg:0.718ms >> > > >> > > Op[build-nodes] min:0.295ms, max:0.295ms, avg:0.295ms >> > > >> > > Op[parse] min:6.466ms, max:6.466ms, avg:6.466ms >> > > >> > > >> > > LEX-BASED*1 rewriter, 1 runs in 592 microsecs [0.592128] millis/run >> > > >> > > PARSE-BASED*1 rewriter, 1 runs in 2083 microsecs [2.083072] millis/run >> > > >> > > Parse/lex ratio: 3.518581081081081 >> > > >> > > [PARSE OPS] >> > > >> > > Op[style-rewrite] min:0.082ms, max:0.082ms, avg:0.082ms >> > > >> > > Op[render] min:0.077ms, max:0.077ms, avg:0.077ms >> > > >> > > Op[js-rewrite] min:0.143ms, max:0.143ms, avg:0.143ms >> > > >> > > Op[link-rewrite] min:0.111ms, max:0.111ms, avg:0.111ms >> > > >> > > Op[build-nodes] min:0.043ms, max:0.043ms, avg:0.043ms >> > > >> > > Op[parse] min:1.437ms, max:1.437ms, avg:1.437ms >> > > >> > > >> > > 3. Drilling further, I wrote a separate test breaking out the >> > > performance >> > > components to parsing: calling the Caja DomParser.parseFragment(...) >> > > API, >> > > and subsequently wrapping the results of that call with ParsedHtmlNode >> > > objects to satisfy interface requirements: >> > > >> > > Typical run: >> > > >> > > Caja parser [size*1, runs:1] in 97538 microsecs [97.538048] millis/run >> > > >> > > [PARSER COMPONENTS] >> > > >> > > Op[raw-caja-parse] min:70.033ms, max:70.033ms, avg:70.033ms >> > > >> > > Op[build-parse-nodes] min:3.644ms, max:3.644ms, avg:3.644ms >> > > >> > > >> > > Caja parser [size*10, runs:1] in 42915 microsecs [42.915072] >> > > millis/run >> > > >> > > [PARSER COMPONENTS] >> > > >> > > Op[raw-caja-parse] min:34.676ms, max:34.676ms, avg:34.676ms >> > > >> > > Op[build-parse-nodes] min:7.148ms, max:7.148ms, avg:7.148ms >> > > >> > > >> > > Caja parser [size*50, runs:1] in 157048 microsecs [157.048064] >> > > millis/run >> > > >> > > [PARSER COMPONENTS] >> > > >> > > Op[raw-caja-parse] min:138.904ms, max:138.904ms, avg:138.904ms >> > > >> > > Op[build-parse-nodes] min:17.313ms, max:17.313ms, avg:17.313ms >> > > >> > > >> > > Caja parser [size*100, runs:1] in 236073 microsecs [236.07296] >> > > millis/run >> > > >> > > [PARSER COMPONENTS] >> > > >> > > Op[raw-caja-parse] min:173.743ms, max:173.743ms, avg:173.743ms >> > > >> > > Op[build-parse-nodes] min:43.295ms, max:43.295ms, avg:43.295ms >> > > >> > > >> > > Conclusions and Discussion: >> > > >> > > The purpose of this task was to prove that tree-based parsing is >> > > functionally viable, which has succeeded. Past that, it's a matter of >> > > choosing functionality vs. performance. Given that rewriting results >> > > are >> > > cached, perhaps even ~3x increase in rewriting cost will be worth >> > > paying. >> > > >> > > >> > > That's particularly true given the new class of optimizations/rewrites >> > made >> > > possible with a parse tree, as well as some bugs that are more easily >> > fixed >> > > using it. For instance, I recently discovered a bug with the existing >> > > JS >> > > tag >> > > rewriter which ignores type="..." attributes and doesn't maintain "id" >> > > attributes in certain situations. These can be resolved in the lexer >> > case, >> > > but are clearer in the parser one. >> > > >> > > >> > > Lastly, as mentioned at the beginning of this thread, I plan to >> > > maintain >> > > the >> > > ability to manipulate a gadget by string, meaning a lexer-based >> > > approach >> > > can >> > > still be used where desired and parse-tree isn't required. >> > > >> > > >> > > Next steps: >> > > >> > > 1. My next step is to add modularity to content rewriting, but again >> > > without >> > > changing any caching semantics. Instead, rather than a single >> > > ContentRewriter being injected, a ContentRewriterRegistry will be. The >> > > default Registry will support injection of a single ContentRewriter to >> > > maintain backward compatibility for now. >> > > >> > > 2. GadgetSpec immutability restored, ensuring post-rewritten caching. >> > > >> > > 3. ContentRewriter API cleanup. >> > > >> > > >> > > --John >> > > >> > > >> > > On Tue, Aug 12, 2008 at 7:43 PM, John Hjelmstad <[EMAIL PROTECTED]> >> > wrote: >> > > >> > > > Interesting idea, and sounds fine to me. Concretely, this lets me >> > > sidestep >> > > > SHINDIG-500 for a little while, which is nice (though I'd _really_ >> > > > like >> > > to >> > > > see the API cleanup go in! :)), in favor of migrating the existing >> > > rewriter >> > > > to a tree-based approach. Turns out I've been working on #1 and #2 >> > > > independently anyway. I'll post a patch soon. Thanks! >> > > > >> > > > John >> > > > >> > > > >> > > > On Tue, Aug 12, 2008 at 7:14 PM, Louis Ryan <[EMAIL PROTECTED]> >> > > > wrote: >> > > > >> > > >> Can we prove this out incrementally bottom-up. In general I think >> > using >> > > >> DOM >> > > >> is the right thing to do from a rewriting standpoint. So here's how >> > > >> I >> > > >> propose we proceed >> > > >> >> > > >> 1. If the Caja dom is a little awkward wrap it, if not lets just >> > > >> use >> > it >> > > as >> > > >> is. We can always resolve this later >> > > >> 2. Change the existing content rewriters to use the DOM instead of >> > > >> a >> > > >> lexer, >> > > >> should be pretty easy. Maybe add some fancier rewriting like moving >> > CSS >> > > >> into >> > > >> HEAD >> > > >> 3. Do some perf testing, look into memory overhead of dom >> > transformation >> > > >> etc. >> > > >> 4. Alter GadgetSpec's to retain the dom when they are cached >> > > >> 5. Alter the gadget rendering phase to serialize the content of the >> > dom >> > > to >> > > >> output >> > > >> 6. Annotate the dom at parse time to make render time user-pref >> > > >> substituions >> > > >> faster, this should be easy enough too... >> > > >> >> > > >> This should be enough to prove out the pipeline end-to-end and >> > identify >> > > >> any >> > > >> major perf niggles. Once this is done we can look into how to >> > > >> inject a >> > > >> rewriter pipeline into the parsing phase and the rendering phase. >> > > >> >> > > >> -Louis >> > > >> >> > > >> >> > > >> >> > > >> On Tue, Aug 12, 2008 at 5:57 PM, John Hjelmstad <[EMAIL PROTECTED]> >> > > wrote: >> > > >> >> > > >> > Re-responding in order to apply the last few exchanges to >> > > >> > google-caja-discuss@ (@gmail vs. @google membership issues). >> > > >> > >> > > >> > On Tue, Aug 12, 2008 at 4:48 PM, John Hjelmstad >> > > >> > <[EMAIL PROTECTED]> >> > > >> wrote: >> > > >> > >> > > >> > > Hello, >> > > >> > > >> > > >> > > While beginning to refactor the rewriter APIs I've discovered >> > > >> > > that >> > > >> there >> > > >> > > unfortunately is one semantic difference inherent to moving >> > > >> getContent() >> > > >> > and >> > > >> > > setContent() methods into the Gadget object (replacing >> > > >> > > View.get/setRewrittenContent()): BasicGadgetSpecFactory no >> > > >> > > longer >> > > >> caches >> > > >> > > rewritten content. >> > > >> > > >> > > >> > > I've written a discussion of this in issue SHINDIG-500, which >> > tracks >> > > >> this >> > > >> > > implementation sub-task: >> > > >> > https://issues.apache.org/jira/browse/SHINDIG-500 >> > > >> > > >> > > >> > > To summarize: >> > > >> > > 1. Is this change acceptable for the time being? >> > > >> > > 2. I suggest that we can, at a later date, move fetching of >> > > >> > > gadget >> > > >> specs >> > > >> > > into GadgetServer while injecting a Gadget(Spec) cache there as >> > > well, >> > > >> > > offering finer-tuned control over caching characteristics. >> > > >> > > >> > > >> > > Thanks, >> > > >> > > John >> > > >> > > >> > > >> > > >> > > >> > > On Mon, Aug 11, 2008 at 2:20 PM, John Hjelmstad >> > > >> > > <[EMAIL PROTECTED] >> > > >> > > >> > wrote: >> > > >> > > >> > > >> > >> I understand these concerns, and should be clear that I don't >> > > >> (despite >> > > >> > my >> > > >> > >> personal interest in experimenting with the idea, agreed that >> > > >> > >> we >> > > >> don't >> > > >> > have >> > > >> > >> time for it at the moment) have any plans to introduce this >> > > >> > >> sort >> > of >> > > >> RPC >> > > >> > >> anywhere - certainly not in Shindig itself, as any such call >> > would >> > > be >> > > >> > hidden >> > > >> > >> behind an interface anyway. >> > > >> > >> >> > > >> > >> Putting the RPC hypothetical aside, I still feel that there's >> > value >> > > >> to >> > > >> > >> implementing HTML parsing in terms of an interface: >> > > >> > >> * Clearer separation of concerns/boundary between projects. >> > > >> > >> - Corollary simplicity in testing. >> > > >> > >> * Clearer API for content manipulation (that doesn't require >> > > >> knowledge >> > > >> > of >> > > >> > >> Caja). >> > > >> > >> >> > > >> > >> I could be convinced otherwise, but at this point the code >> > involved >> > > >> > seems >> > > >> > >> of manageable size, so still worth doing. Thoughts? >> > > >> > >> >> > > >> > >> John >> > > >> > >> >> > > >> > >> >> > > >> > >> >> > > >> > >> On Mon, Aug 11, 2008 at 1:00 PM, Kevin Brown <[EMAIL PROTECTED]> >> > > >> wrote: >> > > >> > >> >> > > >> > >>> I agree with Louis -- that's just not practical. Every >> > > >> > >>> rewriting >> > > >> > >>> operation >> > > >> > >>> must work in real time. Caja's existing html parser is >> > > >> > >>> adequate >> > > for >> > > >> our >> > > >> > >>> needs, and we shouldn't go out of our way to tolerate every >> > oddity >> > > >> of >> > > >> > >>> random >> > > >> > >>> web browsers (especially as it simply wouldn't work unless >> > > >> > >>> you >> > > >> farmed >> > > >> > it >> > > >> > >>> out >> > > >> > >>> to *every* browser). Any new code needs to be grounded in >> > > practical, >> > > >> > >>> current >> > > >> > >>> needs, not theoretical options. We can always change code >> > > >> > >>> later >> > if >> > > >> we >> > > >> > >>> find a >> > > >> > >>> real need for something like that. We have real work to do in >> > the >> > > >> > >>> meantime. >> > > >> > >>> >> > > >> > >>> On Mon, Aug 11, 2008 at 12:06 PM, Louis Ryan >> > > >> > >>> <[EMAIL PROTECTED]> >> > > >> wrote: >> > > >> > >>> >> > > >> > >>> > John, >> > > >> > >>> > >> > > >> > >>> > From a practicality standpoint I'm a little nervous about >> > > >> > >>> > this >> > > >> plan >> > > >> > to >> > > >> > >>> make >> > > >> > >>> > RPCs calls out of a Java process to a native process to >> > > >> > >>> > fetch >> > a >> > > >> parse >> > > >> > >>> tree >> > > >> > >>> > for transformations that have to occur realtime. I don't >> > > >> > >>> > think >> > > the >> > > >> > >>> > motivating factor here is to accept all inputs that >> > > >> > >>> > browsers >> > > can. >> > > >> > >>> Gadget >> > > >> > >>> > developers will tailor their markup to the platform as they >> > have >> > > >> done >> > > >> > >>> > already. I would greatly prefer us to pick one 'good' >> > > >> > >>> > parser >> > and >> > > >> > stick >> > > >> > >>> with >> > > >> > >>> > it for all the manageability and consumability benefits >> > > >> > >>> > that >> > > come >> > > >> > with >> > > >> > >>> that >> > > >> > >>> > decision. Perhaps Im missing something here? >> > > >> > >>> > >> > > >> > >>> > -Louis >> > > >> > >>> > >> > > >> > >>> > On Mon, Aug 11, 2008 at 11:59 AM, John Hjelmstad < >> > > >> [EMAIL PROTECTED]> >> > > >> > >>> wrote: >> > > >> > >>> > >> > > >> > >>> > > On Fri, Aug 8, 2008 at 6:10 AM, Ben Laurie >> > > >> > >>> > > <[EMAIL PROTECTED] >> > > >> > > >> > wrote: >> > > >> > >>> > > >> > > >> > >>> > > > [+google-caja-discuss] >> > > >> > >>> > > > >> > > >> > >>> > > > On Thu, Aug 7, 2008 at 9:27 PM, John Hjelmstad < >> > > >> [EMAIL PROTECTED] >> > > >> > > >> > > >> > >>> > wrote: >> > > >> > >>> > > > > On Thu, Aug 7, 2008 at 3:20 AM, Ben Laurie < >> > > [EMAIL PROTECTED] >> > > >> > >> > > >> > >>> wrote: >> > > >> > >>> > > > > >> > > >> > >>> > > > >> On Wed, Aug 6, 2008 at 11:34 PM, John Hjelmstad < >> > > >> > >>> [EMAIL PROTECTED]> >> > > >> > >>> > > > wrote: >> > > >> > >>> > > > >> > This proposal effectively enables the renderer to >> > > become >> > > >> a >> > > >> > >>> > > multi-pass >> > > >> > >>> > > > >> > compiler for gadget content (essentially, >> > > >> > >>> > > > >> > arbitrary >> > web >> > > >> > >>> content). >> > > >> > >>> > > Such >> > > >> > >>> > > > a >> > > >> > >>> > > > >> > compiler can provide several benefits: static >> > > >> optimization >> > > >> > of >> > > >> > >>> > gadget >> > > >> > >>> > > > >> content >> > > >> > >>> > > > >> > (auto-proxying of images, whitespace/comment >> > > >> > >>> > > > >> > removal, >> > > >> > >>> > consolidation >> > > >> > >>> > > of >> > > >> > >>> > > > >> CSS >> > > >> > >>> > > > >> > blocks), security benefits (caja et al), new >> > > >> functionality >> > > >> > >>> > > (annotation >> > > >> > >>> > > > of >> > > >> > >>> > > > >> > content for stats, document analysis, >> > > container-specific >> > > >> > >>> > features), >> > > >> > >>> > > > etc. >> > > >> > >>> > > > >> To >> > > >> > >>> > > > >> > my knowledge no such infrastructure exists today >> > (with >> > > >> the >> > > >> > >>> > possible >> > > >> > >>> > > > >> > exception of Caja itself, which I'd like to >> > > >> > >>> > > > >> > dovetail >> > > with >> > > >> > this >> > > >> > >>> > > work). >> > > >> > >>> > > > >> >> > > >> > >>> > > > >> Caja clearly provides a large chunk of the code >> > > >> > >>> > > > >> you'd >> > > need >> > > >> for >> > > >> > >>> this. >> > > >> > >>> > > > >> I'd like to hear how we'd manage to avoid >> > > >> > >>> > > > >> duplication >> > > >> between >> > > >> > >>> the >> > > >> > >>> > two >> > > >> > >>> > > > >> projects. >> > > >> > >>> > > > >> >> > > >> > >>> > > > >> A generalised framework for manipulating content >> > > >> > >>> > > > >> sounds >> > > >> like a >> > > >> > >>> great >> > > >> > >>> > > > >> idea, but probably should not live in either of the >> > > >> > >>> > > > >> two >> > > >> > projects >> > > >> > >>> > (Caja >> > > >> > >>> > > > >> and Shindig) but rather should be shared by both of >> > them, >> > > I >> > > >> > >>> suspect. >> > > >> > >>> > > > > >> > > >> > >>> > > > > >> > > >> > >>> > > > > I agree on both counts. As I mentioned, the piece of >> > this >> > > >> idea >> > > >> > >>> that I >> > > >> > >>> > > > expect >> > > >> > >>> > > > > to change the most is the parse tree, and Caja's >> > > >> .parser.html >> > > >> > and >> > > >> > >>> > > > > .parser.css packages contain much of what I've thrown >> > > >> > >>> > > > > in >> > > >> here >> > > >> > as >> > > >> > >>> a >> > > >> > >>> > > base. >> > > >> > >>> > > > > >> > > >> > >>> > > > > My key requirements are: >> > > >> > >>> > > > > * Lightweight framework. >> > > >> > >>> > > > > * Parser modularity, mostly for HTML parsers (to >> > > >> > >>> > > > > re-use >> > > the >> > > >> > good >> > > >> > >>> work >> > > >> > >>> > > > done >> > > >> > >>> > > > > by WebKit or Gecko.. CSS/JS can come direct from Caja >> > I'd >> > > >> bet) >> > > >> > >>> > > > > * Automatic maintenance of DOM<->String conversion. >> > > >> > >>> > > > > * Easy to manipulate structure. >> > > >> > >>> > > > >> > > >> > >>> > > > I'm not sure what the value of parser modularity is? If >> > the >> > > >> > >>> resulting >> > > >> > >>> > > > tree is different, then that's a problem for people >> > > processing >> > > >> > the >> > > >> > >>> > > > tree. And if it is not, then why do we care? >> > > >> > >>> > > >> > > >> > >>> > > >> > > >> > >>> > > IMO the value of parser modularity is that the lenient >> > parsers >> > > >> > native >> > > >> > >>> to >> > > >> > >>> > > browsers can be used in place of those that might not >> > > >> > >>> > > accept >> > > all >> > > >> > >>> inputs. >> > > >> > >>> > > One >> > > >> > >>> > > could (and I'd like to) adapt WebKit or Gecko's parsing >> > > >> > >>> > > code >> > > >> into a >> > > >> > >>> > server >> > > >> > >>> > > that runs parallel to Shindig and provides a "local RPC" >> > > service >> > > >> > for >> > > >> > >>> > > parsing >> > > >> > >>> > > semi-structured HTML. The resulting tree for WebKit's >> > > >> > >>> > > parser >> > > >> might >> > > >> > be >> > > >> > >>> > > different than that for an XHTML parser, Gecko's parser, >> > etc, >> > > >> but >> > > >> > if >> > > >> > >>> the >> > > >> > >>> > > algorithm implemented atop it is rule-based rather than >> > > >> > >>> strict-structure >> > > >> > >>> > > based that should be fine, no? >> > > >> > >>> > > >> > > >> > >>> > > >> > > >> > >>> > > > >> > > >> > >>> > > > >> > > >> > >>> > > > > >> > > >> > >>> > > > > I'd love to see both projects share the same base >> > > >> > >>> > > > > syntax >> > > >> tree >> > > >> > >>> > > > > representations. I considered .parser.html(.DomTree) >> > > >> > >>> > > > > and >> > > >> > >>> .parser.css >> > > >> > >>> > > for >> > > >> > >>> > > > > these, but at the moment these appeared to be a >> > > >> > >>> > > > > little >> > > more >> > > >> > tied >> > > >> > >>> to >> > > >> > >>> > > > Caja's >> > > >> > >>> > > > > lexer/parser implementation than I preferred (though >> > > >> > >>> > > > > I >> > > admit >> > > >> > >>> > > > > AbstractParseTreeNode contains most of what's >> > > >> > >>> > > > > needed). >> > > >> > >>> > > > > >> > > >> > >>> > > > > To be sure, I don't see this as an end-all-be-all >> > > >> > transformation >> > > >> > >>> > system >> > > >> > >>> > > > in >> > > >> > >>> > > > > any way. I'd just like to put *something* reasonable >> > > >> > >>> > > > > in >> > > >> place >> > > >> > >>> that we >> > > >> > >>> > > can >> > > >> > >>> > > > > play with, provide some benefit, and enhance into a >> > truly >> > > >> > >>> > sophisticated >> > > >> > >>> > > > > vision of document rewriting. >> > > >> > >>> > > > > >> > > >> > >>> > > > > >> > > >> > >>> > > > >> >> > > >> > >>> > > > >> >> > > >> > >>> > > > >> > c. Add Gadget.getParsedContent(). >> > > >> > >>> > > > >> > i. Returns a mutable GadgetContentParseTree >> > > >> > >>> > > > >> > used >> > to >> > > >> > >>> manipulate >> > > >> > >>> > > > Gadget >> > > >> > >>> > > > >> > Contents. >> > > >> > >>> > > > >> > ii. Mutable tree calls back to the Gadget >> > > >> > >>> > > > >> > object >> > > >> > indicating >> > > >> > >>> > when >> > > >> > >>> > > > any >> > > >> > >>> > > > >> > change is made, and emits an error if setContent() >> > has >> > > >> been >> > > >> > >>> called >> > > >> > >>> > > in >> > > >> > >>> > > > the >> > > >> > >>> > > > >> > interim. >> > > >> > >>> > > > >> >> > > >> > >>> > > > >> In Caja we have been moving towards immutable >> > > >> > >>> > > > >> trees... >> > > >> > >>> > > > > >> > > >> > >>> > > > > >> > > >> > >>> > > > > Interested to hear more about this. The whole idea is >> > for >> > > >> the >> > > >> > >>> > gadget's >> > > >> > >>> > > > tree >> > > >> > >>> > > > > representation to be modifiable. Doing that with >> > immutable >> > > >> > trees >> > > >> > >>> to >> > > >> > >>> > me >> > > >> > >>> > > > > suggests that a rewriter would have to create a >> > completely >> > > >> new >> > > >> > >>> tree >> > > >> > >>> > and >> > > >> > >>> > > > set >> > > >> > >>> > > > > it as a representation of new content. That's >> > > >> > >>> > > > > convenient >> > > as >> > > >> far >> > > >> > >>> as >> > > >> > >>> > the >> > > >> > >>> > > > > Gadget's maintenance of String<->Tree representations >> > > >> > >>> > > > > is >> > > >> > >>> concerned... >> > > >> > >>> > > but >> > > >> > >>> > > > > seems pretty heavyweight for many types of edits: >> > in-situ >> > > >> > >>> > modifications >> > > >> > >>> > > > of >> > > >> > >>> > > > > text, content reordering, etc. That's particularly so >> > > >> > >>> > > > > in >> > a >> > > >> > >>> > > > single-threaded >> > > >> > >>> > > > > (viz rewriting) environment. >> > > >> > >>> > > > >> > > >> > >>> > > > Never having been entirely sold on the concept, I'll >> > > >> > >>> > > > let >> > > those >> > > >> on >> > > >> > >>> the >> > > >> > >>> > > > Caja team who advocate immutability explain why. >> > > >> > >>> > > > >> > > >> > >>> > > >> > > >> > >>> > >> > > >> > >>> >> > > >> > >> >> > > >> > >> >> > > >> > > >> > > >> > >> > > >> >> > > > >> > > > >> > > >> > > > > --~--~---------~--~----~------------~-------~--~----~ > You received this message because you are subscribed to > http://groups.google.com/group/google-caja-discuss > To unsubscribe, email [EMAIL PROTECTED] > -~----------~----~----~----~------~----~------~--~--- > >

