On 2/12/08, Daniel Kinzler <[EMAIL PROTECTED]> wrote:
> 1) "parser hook" extensions (aka tag hooks aka extension tags), which conform 
> to
> a (fuzzy) xml syntax: <name foo="bar" bla=12 blubb>...</name>; The ... in
> between the tags should be completely opaque, the parser should skip 
> everything
> up to the closing tag. There is no support for nesting, no expansion of
> templates or template parameters, nothing. Also, the the text *returned* by 
> the
> extension is expected to be HTML, and should be passed through the generation
> stage untouched.

The trouble there is that <ref> for example can contain
wikitext...which needs to be parsed. e.g.:

<ref>''The origin of species'', Darwin</ref>

So at a minimum I think we would need to distinguish those extensions
whose internal text needs to be parsed?

>
> 2) "parser functions" which conform to an extended template syntax:
> {{#name: param|param|param...}}; In this case, all parameters have to be fully
> parsed and expanded, so this needs to work:
> {{#foo:xx|{{#bar|{{{bla|frob}}}}}|{{something}}}}
>
> The output of parser functions may be wikitext that has to be further 
> processed
> in context (just as if it where a normal template), or it may be HTML that has
> to be passed through (and a few more minor options). This is determined by 
> each
> extension when registering the hook.

Afaik, these are converted by the preprocessor (recently rewritten by
Tim), and are completely invisible to the parser?

> Extensions may also introduce arbitrary magic words. Such extensions are
> impossible to make compatible with a new ANTRL based parser, they would have 
> to
> be rewritten as plugins to such a parser. Would it be possible to allow such
> plugins? I'm thinking of allowing a way for extensions to redifine individula
> bits of the grammar.

It depends a bit on the limits of these "arbitrary magic words". I
think it's actually suprisingly feasible to allow magic words that,
say, consist of strings of letters surrounded by space, or certain
predefined punctuation.

At first I thought that would be a nightmare, but in practice it
isn't. As the second last rule before rendering a string of letters
literally, I would simply add a (Java/PHP) check to see if the string
matched any registered extension, and parse it as an extension magic
word instead. Here's how that happens with __TOC__ etc:

magic_word: UNDERSCORE UNDERSCORE  magic_word_text UNDERSCORE UNDERSCORE
-> ^(MAGIC_WORD magic_word_text);

magic_word_text: {is_magic_word()}? letters;

@members {
....
  boolean is_magic_word() {
    return
        input.LT(1).getText().equalsIgnoreCase("NOTOC") ||
        input.LT(1).getText().equalsIgnoreCase("TOC") ||
        input.LT(1).getText().equalsIgnoreCase("FORCETOC") ||
        input.LT(1).getText().equalsIgnoreCase("NOGALLERY") ||
        input.LT(1).getText().equalsIgnoreCase("NOEDITSECTION")
    ;
  }

}

It would only be a problem if the contents of the magic word
interfered with the lexer - say a combination of letters and other
punctuation. But if the available combinations were predefined (eg,
hyphen hyphen letters digit hyphen hyphen) then they can be dealt
with, and the letters themselves defined at runtime.

Steve

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to