Hello, how about 2-stage processing? Loading HTML into WebKitGtk, dumping DOM ( https://webkitgtk.org/reference/webkit2gtk/stable/WebKitWebPage.html#webkit-web-page-get-dom-document) which contains already parsed structure, sanitizing DOM and displaying serialized version of modified DOM for the future use?
It should be more secure, too. m. 2016-08-01 10:01 GMT+02:00 Michael Gratton <m...@vee.net>: > > Hey all, > > I'm looking for an HTML tag soup library for Geary, that can load tag soup > HTML (i.e. possibly malformed) from a stream, allow some manipulation of > it, and re-serialise it for display in WebKitGTK. Ideally, a pull-parser > API like libxml2's TextReader or StAX[0] would be great, so the whole > document does not need to be kept in memory as it is processed. > > These are the ones I know about: > > libxml2: > - Pros: Has a pull parser API, has a HTML4 tag soup parser, installed > everywhere > - Cons: Pull parser doesn't work with HTML parser without reading whole > document into memory, HTML parser out of date(?) > > GXml: > - Pros: Nice Vala API, uses libxml2 under the hood > - Cons: Not a pull parser, loads whole document into memory, doesn't seem > to be packaged for any distros, doesn't use the libxml HTML parser(?) > > Others: > - WebKitGTK+: Great tag soup parser, no pull API, doesn't allow > manipulating the markup before displaying it (which is the main reason I > need to parse the HTML beforehand) > - XML Bird: Nice Vala API, but not a pull parser or a HTML parser > > So none of these seem to completely fit the bill. Are there any other > options out there that I have missed? Has anyone else had parse tag soup in > Vala? > > Ta! > //Mike > > [0] - <https://en.wikipedia.org/wiki/StAX> > > -- > ⊨ Michael Gratton, Percept Wrangler. > ⚙ <http://mjog.vee.net/> > > > _______________________________________________ > vala-list mailing list > vala-list@gnome.org > https://mail.gnome.org/mailman/listinfo/vala-list > _______________________________________________ vala-list mailing list vala-list@gnome.org https://mail.gnome.org/mailman/listinfo/vala-list