The switch to use htmlparser is something I've been planning to do for quite a while. We're currently waiting for Mike et al to fix some issues in their CSS DOM before I go ahead and make the switch, which has significant benefits for our sanitization and cajoling pipelines. I believe there is a CL out for review to fix this on Caja.
On Wed, Jul 8, 2009 at 3:46 AM, Paul Lindner <plind...@linkedin.com> wrote: > I filed https://issues.apache.org/jira/browse/SHINDIG-1107 > Does anyone have any opinion about cleaning up those dependencies? We were > pulling in json-lib which seems unnecessary since we have a native json > serializer in place now. > > Another simplification is deprecating nekohtml for htmlparser, which is > used > by caja. I asked the caja folks about using neko and this was their > response: > > htmlparser was recommended by Ian Hickson, author of large chunks of > the HTML5 spec > as conforming closely to the spec. Nekohtml is indeed quite fast but > htmlparser does > a better job of more accurately producing the kind of DOM that you > would get in an > actual browser (which is what we're trying to codify) when parsing tag > soup. > > Mike Samuel looked at nekohtml more recently (primarily to see if we > could benefit > from faster parsing by neko) and improved our own parsing speed to a > point where it > is comparable to neko. I am not sure I fully follow the benefit of > removing > dependency on icu4j. >