The switch to use htmlparser is something I've been planning to do for quite
a while. We're currently waiting for Mike et al to fix some issues in their
CSS DOM before I go ahead and make the switch, which has significant
benefits for our sanitization and cajoling pipelines. I believe there is a
CL out for review to fix this on Caja.

On Wed, Jul 8, 2009 at 3:46 AM, Paul Lindner <plind...@linkedin.com> wrote:

> I filed https://issues.apache.org/jira/browse/SHINDIG-1107
> Does anyone have any opinion about cleaning up those dependencies?  We were
> pulling in json-lib which seems unnecessary since we have a native json
> serializer in place now.
>
> Another simplification is deprecating nekohtml for htmlparser, which is
> used
> by caja.  I asked the caja folks about using neko and this was their
> response:
>
> htmlparser was recommended by Ian Hickson, author of large chunks of
> the HTML5 spec
> as conforming closely to the spec.  Nekohtml is indeed quite fast but
> htmlparser does
> a better job of more accurately producing the kind of DOM that you
> would get in an
> actual browser (which is what we're trying to codify) when parsing tag
> soup.
>
> Mike Samuel looked at nekohtml more recently (primarily to see if we
> could benefit
> from faster parsing by neko) and improved our own parsing speed to a
> point where it
> is comparable to neko.  I am not sure I fully follow the benefit of
> removing
> dependency on icu4j.
>

Reply via email to