> -----Original Message----- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of > Daniel Kinzler > Sent: 13 February 2008 22:30 > To: Wikitext-l > Subject: Re: [Wikitext-l] Draft 10 published > > > Your not going to get 100% compatibility moving from the multiple > > search/replace method into a single parse. > > > > Hooks embedded within the parser, like InternalParseBeforeLinks, > > ParserBeforeTidy become impossible to do. > > True. I was thinking of "clean" tag hooks and parser > functions. These should continue to work wit ha minimum of > modification. I don't mind the black magic braking. > > >> That is, the grammar should NOT know about <ref>, not what > it does, > >> not even that it exists. It should simply have a facility > that allows > >> externam (php) code to handle the characters (unchanged!) between > >> (some specific) tags. > > > > Agreed, the grammar should know how to pass and correct tag > soup style > > HTML/XML that gets handed off to deal with. > > Yes, though for the parser, there are three cases to consider > for HTML/XML style > tags: > > 1) (whitelisted) HTML tags, which can occur "soupy", and are > more or less passed through (or "tidied" into valid xhtml). > 2) Other tags (potentially handled by an extension) which > must match in pairs exactly and cause the parser to take > anything *inbetween* LITERALLY, and pass it to the extension > for processing. > 3) In case there is no such extension, it needs to go back, > read the *tags* literally, and then parse the text between the tags.
All tag attributes are parsed Santizer::decodeTagAttributes() I believe so things like attributes with missing values <foo bar> are possible for all tags. In 2, not sure they must always be matched in pairs. Think somewhere (possibly in Parser::extractTagsAndParams()) allows unterminated tags to run to the end of the text. 3, unrecoginised tags should just cause the parser to output a < and carry on parsing. > > There's even a fourth case, namely magic tags like <nowiki> > that have to be known to the parser for special handling - > these may also include <includeonly>, <onlyinclude> and > <noinclude>, though those might be handled by the > preprocessor, i'm not sure about that. I believe (haven't looked into it or implemented yet) that the onlyinclude and noinclude are essentially filters that occur at transclusion time. Includeonly is a filter at save time? Preventing a template being associated with a category for example. > > In the case of (some!) parser functions, it has to be > considered that the > *output* of the extension would have to be parsed to, > inlined. But that stuff is probably handled by the > preprocessor - if that is indeed the case, there's nothing to > worry about. > > -- Daniel > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
