> -----Original Message-----
> From: [EMAIL PROTECTED] 
> [mailto:[EMAIL PROTECTED] On Behalf Of 
> Daniel Kinzler
> Sent: 13 February 2008 22:30
> To: Wikitext-l
> Subject: Re: [Wikitext-l] Draft 10 published
> 
> > Your not going to get 100% compatibility moving from the multiple 
> > search/replace method into a single parse.
> > 
> > Hooks embedded within the parser,  like InternalParseBeforeLinks, 
> > ParserBeforeTidy become impossible to do.
> 
> True. I was thinking of "clean" tag hooks and parser 
> functions. These should continue to work wit ha minimum of 
> modification. I don't mind the black magic braking.
> 
> >> That is, the grammar should NOT know about <ref>, not what 
> it does, 
> >> not even that it exists. It should simply have a facility 
> that allows 
> >> externam (php) code to handle the characters (unchanged!) between 
> >> (some specific) tags.
> > 
> > Agreed, the grammar should know how to pass and correct tag 
> soup style 
> > HTML/XML that gets handed off to deal with.
> 
> Yes, though for the parser, there are three cases to consider 
> for HTML/XML style
> tags:
> 
> 1) (whitelisted) HTML tags, which can occur "soupy", and are 
> more or less passed through (or "tidied" into valid xhtml).
> 2) Other tags (potentially handled by an extension) which 
> must match in pairs exactly and cause the parser to take 
> anything *inbetween* LITERALLY, and pass it to the extension 
> for processing.
> 3) In case there is no such extension, it needs to go back, 
> read the *tags* literally, and then parse the text between the tags.

All tag attributes are parsed Santizer::decodeTagAttributes() I believe so
things like attributes with missing values
<foo bar> are possible for all tags.

In 2, not sure they must always be matched in pairs. Think somewhere
(possibly in Parser::extractTagsAndParams()) allows unterminated tags to run
to the end of the text.

3, unrecoginised tags should just cause the parser to output a &lt; and
carry on parsing. 

> 
> There's even a fourth case, namely magic tags like <nowiki> 
> that have to be known to the parser for special handling - 
> these may also include <includeonly>, <onlyinclude> and 
> <noinclude>, though those might be handled by the 
> preprocessor, i'm not sure about that.

I believe (haven't looked into it or implemented yet) that the onlyinclude
and noinclude are essentially filters that occur at transclusion time.
Includeonly is a filter at save time? Preventing a template being associated
with a category for example. 

> 
> In the case of (some!) parser functions, it has to be 
> considered that the
> *output* of the extension would have to be parsed to, 
> inlined. But that stuff is probably handled by the 
> preprocessor - if that is indeed the case, there's nothing to 
> worry about.
> 
> -- Daniel
> 
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to