2010-08-30 13:22, Jan Paul Posma skrev: >> >>> It would be really nice to be able to include hooks after the lexer, but >>> before actual parsing. >>> >>> >> That could be done, but I would not recommend it. What application do >> you have in mind? >> > Well, the current implementation of my editor uses a bunch of regexes (like > the current parser) to determine where to inject spans or divs into the > wikitext. Having a more accurate representation (the tokenized wikitext that > the lexer outputs) would allow for more accurate injection. Then again, it > would be complicated to interface that with PHP, I guess? > > Between the lexer and the parser there is just the stream of tokens. How that relates to the ultimately rendered content is non-trivial. I think that you would be much better off by working on top of the listener interface. It would be a help for you to, i'd guess, introduce the period character (or more generally, a localizable sentence seprator character) as it's own token and pass that as an event. But that cannot be efficiently implemented as a "hook", it has to be integrated in the lexer. But it should be perfectly possible to define sentences in the event stream even without such a token.
> How would you handle hooks, tag extensions, parser functions and magic words > anyway? Will you leave this to some post-processing stage in PHP or have > things interact during parsing? > The listener interface in itself constitutes a collection of hooks. From the parser's point of view, a tag extension works the same as <nowiki>. It's up to the listening application to call the appropriate function to process the content. Magic words and parser functions should be handled by a preprocessor, as the substitution of these may yield new tokens. /Andreas _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
