Probably wouldn't hurt to look into some alternative HTML parsers other than
Caja as a performance test. NekoHTML seems to be a good candidate, its
Apache2 and seems to be decent performance wise. I took a quick scan of the
code and it looks pretty reasonable
See
http://nekohtml.svn.sourceforge.net/viewvc/nekohtml/trunk/doc/index.html?revision=194

Some comparative benchmarks and samples (usual disclaimer applies)

http://www.portletbridge.org/saxbenchmark/results.html

On Fri, Oct 3, 2008 at 11:48 AM, Kevin Brown <[EMAIL PROTECTED]> wrote:

> On Fri, Oct 3, 2008 at 10:07 AM, John Hjelmstad <[EMAIL PROTECTED]> wrote:
>
> > On Fri, Oct 3, 2008 at 9:50 AM, Paul Lindner <[EMAIL PROTECTED]> wrote:
> >
> > > It seems to me that parsing Gadget XML should be the least of our
> > worries,
> > > especially if you can insure that the content that's hitting the
> browser
> > > only varies by country and language and view.
> >
> >
> > Correct me if I'm wrong, but that would seem to be thrown out the window
> > with proxied rendering.
> >
> >
> > >
> > >
> > > By moving the parentUrl to the hash I've been able to accomplish this
> for
> > > hi5 -- iframes are cached at the browser and CDN level.
> > >
> > > That said this leave out UserPref support, but I'm fine with that as a
> > > tradeoff.
> > >
> > > Perhaps we should focus on delivering the application as one cacheable
> > > chunk and the per-user/preload data in a second chunk?
> >
> >
> > That's actually what type=html applications do, even with __UP
> substitution
> > (since the UPs are passed on the querystring). Likewise any
> > OpenSocial-templated applications. But proxied throws that out as well,
> > unless we add some caching headers that proxied content rendering can
> pass
> > back telling us the content is cacheable for all users (eg. it contains
> > only
> > substitution constructs such as templates).
>
>
> You can't ever do that because you need a security token to do the proxied
> rendering.
>
>
> > Thinking about this some more:
> > 1. It seems unlikely that parsing cost, even if 25ms, will be a
> substantial
> > component to proxied content rendering latency. Getting the actual,
> > uncacheable data from the app server will be, unless it's specially
> > indicated as cacheable in the first place - at which point we may as well
> > cache the parsed tree as a small optimization. IMO that just calls for
> > passing isCacheable to a caching parser.
>
>
> The data itself is frequently cacheable, especially if it's owner keyed --
> but you MUST validate the security token for this data, because it contains
> user information. If you visit the same profile 100x in a row, the data
> from
> the remote site is still cached, it's just that the iframe isn't.
>
>
> > 2. FWIW, the parsing time of 25ms for BuddyPoke (as with the other
> numbers)
> > comes from my developer workstation running the test in Eclipse. The
> > numbers
> > are intended to be relative. Kevin -- under which environment did you see
> > 10ms cajoling?
>
>
> Running shindig on my own workstation through the YourKit java profiler I
> got 22ms for link rewriting (after making modifications so that it worked
> correctly) and 9ms for cajoling.
>
>
> >
> >
> > Paul, what's your plan for dealing with proxied content latency?
> >
> > -John
> >
> >
> > >
> > >
> > >
> > > On Oct 3, 2008, at 9:35 AM, John Hjelmstad wrote:
> > >
> > >  Hi Ian:
> > >> You're right, it's the gadget XML parse prior to manipulation. It's
> > doing
> > >> DOM-based parsing, and I suspect you're right about the load of small
> > >> objects involved. At present I see that as a requirement, though, to
> > deal
> > >> with semi-well-formed input. We've talked about requiring XHTML or
> > >> something
> > >> close to it as a prerequisite for rewriting - which would make parsing
> > >> vastly easier and rather trivial to implement - but that's a spec
> issue
> > if
> > >> it's to be a general platform requirement.
> > >>
> > >> --John
> > >>
> > >> On Fri, Oct 3, 2008 at 12:53 AM, Ian Boston <[EMAIL PROTECTED]> wrote:
> > >>
> > >>  I don't know the precise details of this conversation or exactly
> which
> > >>> parsing, CajaHtmlParser or XmlUtil.parse, you are talking about, but
> if
> > >>> its
> > >>> the Gadget XML parse prior to manipulation, and this is still using
> DOM
> > >>> based parsing, then its probably going to be slower than SAX under
> > load,
> > >>> and
> > >>> vastly slower than Stax. The reason I say under load, is that DOM
> > parsers
> > >>> tend emit lots of small objects which, once they get out of eden,
> > >>> overload
> > >>> the GC which will dominate as resources become scarce. Having said
> > that,
> > >>> gadget parse trees probably don't exist long enough to get out of
> eden.
> > >>>
> > >>> Ignore me if you are talking about some other parsing going on within
> > >>> gadgets.
> > >>> Ian
> > >>>
> > >>>
> > >>> On 3 Oct 2008, at 02:03, Kevin Brown wrote:
> > >>>
> > >>> The real thing we should be investigating is why it takes 25ms to use
> > the
> > >>>
> > >>>> parser on buddypoke when it only takes 10ms to cajole it.
> > >>>>
> > >>>>
> > >>>
> > >>>
> > > Paul Lindner
> > > [EMAIL PROTECTED]
> > >
> > >
> > >
> > >
> >
>

Reply via email to