Re: [Wikitech-l] [REQ] A mixed xml/html parser for skinning

Dmitriy Sintsov Thu, 10 Feb 2011 01:55:03 -0800

* Daniel Friesen <[email protected]> [Thu, 10 Feb 2011 01:37:18 
-0800]:
> I've been experimenting with a mixed xml/html based template syntax 
for
> skinning[1].
> However I've been having issues with the parsing of it.
>
> - DOMDocument::loadHTML throws warning and when I output it strips out
> namespaces turning <mw:foo> into <foo>
> - SimpleHTMLDOM was the most promising, in fact my current experiments
> got very far with it, however when I got to the need to insert a node
> before/after an element it completely messed up, I'm also not 
optimistic
> of it's performance since there are no dom operations and it's 
"insert"
> is essentially "concatenate some html with the outertext and set
> outertext to it"
> - html5lib choked on namespaces other than built-in handling of things
> like svg: presumably.
> - phpQuery is just a wrapper around DOMDocument
> - tidy's plugin is supposed to support dom parsing, but that is not
> deployed on every server, and even people using tidy through mw might
> not be using the plugin since we support the executable as well. Not 
to
> mention tidy seamed to share issues stripping or choking on <mw:...>
> tags when it came to my editsection stuff. So even the idea of piping
> through tidy then using loadXML on it is out.
> - wiseparser, well I couldn't even get that to execute.
> - XML_HTMLSax is so old and unmaintained I couldn't really get into
> looking at it.
>
>
> The requirements ideally are that it should support the normal html
> parsing we already have (ie: boolean attributes and quoteless 
attributes
> <div foo bar=baz>, perhaps the simple implicitly closed tags like 
<br>),
> but also support parsing tags and attributes with mw: in them, in 
other
> words XML namespaces.
>
> Is there anyone willing to help out building a parser for it?
> Possibilities could be custom parsing directly to dom, custom parsing
> and calling a SAX-like api, or at it's simplest a light parser that
> parses the html and outputs xml we can parse with loadXML instead (I
> believe the issue in DOMDocument is it's html processing not issues 
with
> namespaces), that would end up being a potential tidy replacement. 
Tidy
> can't be used in this case because it too messes up namespaced stuff.
>
> [1]:
> 
http://www.mediawiki.org/wiki/User:Dantman/Skinning_system#xml.2Fhtml_template_syntax
>
Why not just use XMLReader / XMLWriter as WikiImporter does? Performance 
concerns? It uses libxml, should that be good enough?
Dmitriy


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] [REQ] A mixed xml/html parser for skinning

Reply via email to