Probably wouldn't hurt to look into some alternative HTML parsers other than Caja as a performance test. NekoHTML seems to be a good candidate, its Apache2 and seems to be decent performance wise. I took a quick scan of the code and it looks pretty reasonable See http://nekohtml.svn.sourceforge.net/viewvc/nekohtml/trunk/doc/index.html?revision=194
Some comparative benchmarks and samples (usual disclaimer applies) http://www.portletbridge.org/saxbenchmark/results.html On Fri, Oct 3, 2008 at 11:48 AM, Kevin Brown <[EMAIL PROTECTED]> wrote: > On Fri, Oct 3, 2008 at 10:07 AM, John Hjelmstad <[EMAIL PROTECTED]> wrote: > > > On Fri, Oct 3, 2008 at 9:50 AM, Paul Lindner <[EMAIL PROTECTED]> wrote: > > > > > It seems to me that parsing Gadget XML should be the least of our > > worries, > > > especially if you can insure that the content that's hitting the > browser > > > only varies by country and language and view. > > > > > > Correct me if I'm wrong, but that would seem to be thrown out the window > > with proxied rendering. > > > > > > > > > > > > > By moving the parentUrl to the hash I've been able to accomplish this > for > > > hi5 -- iframes are cached at the browser and CDN level. > > > > > > That said this leave out UserPref support, but I'm fine with that as a > > > tradeoff. > > > > > > Perhaps we should focus on delivering the application as one cacheable > > > chunk and the per-user/preload data in a second chunk? > > > > > > That's actually what type=html applications do, even with __UP > substitution > > (since the UPs are passed on the querystring). Likewise any > > OpenSocial-templated applications. But proxied throws that out as well, > > unless we add some caching headers that proxied content rendering can > pass > > back telling us the content is cacheable for all users (eg. it contains > > only > > substitution constructs such as templates). > > > You can't ever do that because you need a security token to do the proxied > rendering. > > > > Thinking about this some more: > > 1. It seems unlikely that parsing cost, even if 25ms, will be a > substantial > > component to proxied content rendering latency. Getting the actual, > > uncacheable data from the app server will be, unless it's specially > > indicated as cacheable in the first place - at which point we may as well > > cache the parsed tree as a small optimization. IMO that just calls for > > passing isCacheable to a caching parser. > > > The data itself is frequently cacheable, especially if it's owner keyed -- > but you MUST validate the security token for this data, because it contains > user information. If you visit the same profile 100x in a row, the data > from > the remote site is still cached, it's just that the iframe isn't. > > > > 2. FWIW, the parsing time of 25ms for BuddyPoke (as with the other > numbers) > > comes from my developer workstation running the test in Eclipse. The > > numbers > > are intended to be relative. Kevin -- under which environment did you see > > 10ms cajoling? > > > Running shindig on my own workstation through the YourKit java profiler I > got 22ms for link rewriting (after making modifications so that it worked > correctly) and 9ms for cajoling. > > > > > > > > Paul, what's your plan for dealing with proxied content latency? > > > > -John > > > > > > > > > > > > > > > > On Oct 3, 2008, at 9:35 AM, John Hjelmstad wrote: > > > > > > Hi Ian: > > >> You're right, it's the gadget XML parse prior to manipulation. It's > > doing > > >> DOM-based parsing, and I suspect you're right about the load of small > > >> objects involved. At present I see that as a requirement, though, to > > deal > > >> with semi-well-formed input. We've talked about requiring XHTML or > > >> something > > >> close to it as a prerequisite for rewriting - which would make parsing > > >> vastly easier and rather trivial to implement - but that's a spec > issue > > if > > >> it's to be a general platform requirement. > > >> > > >> --John > > >> > > >> On Fri, Oct 3, 2008 at 12:53 AM, Ian Boston <[EMAIL PROTECTED]> wrote: > > >> > > >> I don't know the precise details of this conversation or exactly > which > > >>> parsing, CajaHtmlParser or XmlUtil.parse, you are talking about, but > if > > >>> its > > >>> the Gadget XML parse prior to manipulation, and this is still using > DOM > > >>> based parsing, then its probably going to be slower than SAX under > > load, > > >>> and > > >>> vastly slower than Stax. The reason I say under load, is that DOM > > parsers > > >>> tend emit lots of small objects which, once they get out of eden, > > >>> overload > > >>> the GC which will dominate as resources become scarce. Having said > > that, > > >>> gadget parse trees probably don't exist long enough to get out of > eden. > > >>> > > >>> Ignore me if you are talking about some other parsing going on within > > >>> gadgets. > > >>> Ian > > >>> > > >>> > > >>> On 3 Oct 2008, at 02:03, Kevin Brown wrote: > > >>> > > >>> The real thing we should be investigating is why it takes 25ms to use > > the > > >>> > > >>>> parser on buddypoke when it only takes 10ms to cajole it. > > >>>> > > >>>> > > >>> > > >>> > > > Paul Lindner > > > [EMAIL PROTECTED] > > > > > > > > > > > > > > >

