Steve Bennett wrote:
> 
...
> The trouble there is that <ref> for example can contain
> wikitext...which needs to be parsed. e.g.:
> 
> <ref>''The origin of species'', Darwin</ref>
> 
> So at a minimum I think we would need to distinguish those extensions
> whose internal text needs to be parsed?

No. If a tag-style extension wants to support wiki text, it has to explicitly
invoke a new parser pass on the text contained between the tags. The text MUST
NOT be parsed/transformed before being passed to the extension, and what the
extension returns must not be parsed either (the latter is only partially true
for the current parser, but i would call that a bug, not a feature - see bug 
8997).

>> 2) "parser functions" which conform to an extended template syntax:
...
> Afaik, these are converted by the preprocessor (recently rewritten by
> Tim), and are completely invisible to the parser?

I don't know. I don't see why parser functions should be handeled by the
preprocessor while tag hooks are not. But maybe this is so.

> magic_word: UNDERSCORE UNDERSCORE  magic_word_text UNDERSCORE UNDERSCORE
> -> ^(MAGIC_WORD magic_word_text);
...
> It would only be a problem if the contents of the magic word
> interfered with the lexer - say a combination of letters and other
> punctuation. But if the available combinations were predefined (eg,
> hyphen hyphen letters digit hyphen hyphen) then they can be dealt
> with, and the letters themselves defined at runtime.

Magic words don't have to have the form __XXX__ - they can be characterized by
any regular expression. Consider how ISBN and RFC are treated - those are magic
words too... Oh and please consider that the patterns are frequently localizable
(and are thus maintained in mediawiki's messages files): French, for example,
allows __AUCUNETABLE__ for __NOTOC__. The same goes for #REDIRECT btw: dutch
allows #DOORVERWIJZING, etc...

I'm not entirely sure if extensions are free to define magic words using *any*
pattern, but I think this is so. MagicWord.php is entirely regex-based. Which
would mean that either your parser will only support some types of magic words,
or it needs a way to hook into the actual grammar.

Oh, and "variables" like {{PAGENAME}} are treated as magic words internally,
though that wouldn't have to be so. I would probably use the template mechanism,
and simply intercept the use of special names.


-- Daniel

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to