https://bugzilla.wikimedia.org/show_bug.cgi?id=164
--- Comment #205 from Philippe Verdy <verd...@wanadoo.fr> 2010-08-04 17:09:44 UTC --- (In reply to comment #204) > If people want to put crazy stuff in sortkeys that changes based on who's > viewing it, we can't stop them. Curly braces are evaluated at an earlier > stage > than category links, so we can't make them behave differently based on whether > they're being used in a sortkey, I don't think. You could also put > {{CURRENTTIME}} in sortkeys, or any other silly thing like that. That's because the Mediawiki parser still operates on the wrong level, and performs text subtitutions always without ingoring the context of use. Not only this is bad, but this is also very inefficient, because each processing level is converting a fullpage to another fullpage that needs to be reparsed again at the next level. A much better scheme based on a true gramatical analyser using a finite state automata would really help defining the state at which the parser (or its ParserFunctions extensions) operate, without ever having to create huge complete page buffers between each level (which uses costly appending operations with many memory reallocations). In other words, when you start parsing "[[" you enter in a "start-of-link" state, which then parses "category:" until it has found a colon, in which case the whole prefix is case-folded and goes to the "in-category" state, then the parser scans up to the next "{" or "|" or "]". It can then correctly process all the text using such rules. I have suggested since long that the MediaWiki syntax should be reformulated using a LALR(1) formal syntax, from which a finite-state automata can be automatically built to cover all the state information, and then ported to PHP (Yacc, Bison or even better PCTCS could do that without difficulty). Then instead of calling parsing functions that process all the text, then return the converted text for processing to the next level, it will do that in a much simpler (and faster) processing loop, calling much simple generation methods and altering its state in a much cleaner and faster way (no more need to append small lexical items to various buffers, the atoms will be pased from level to level using a chained implementation. This would also significantly speedup the expansion of templates, and would allow the parser to make distintions when {{int:....}} is encountered in the context of a [[category:...]] (which would have temporarily forced the UI language to the CONTENTLANGUAGE, until the final "]]" token is encountered that would restore the UI language within the parser's internal state. this would also have significant performance benefits (for example, no more need to convert and expand all the parameters of {{#if:...}} or {{#switch:...}}, only convert them lazily when they are really needed for generating the effective output. Yes this comment is going too far out of the topic of this bug, it is architectural. PHP has all the rools needed to support the construction of tree-like data: instead of passing only strings to parser generation functions, you would pass it an ordered array, whose elements are the parsed parameters, themselves being either strings or subparsed arrays containing strings and other arrays. The builtin functions would then request back to the MediaWiki parser the evaluation/expansion of ONLY the parameters they need, and MediaWiki would still be able to call the expansions of ONLY these parameters, recursively. Most of the items in the wikicode would then be atomic and processed lazily, only when they are effectively needed. The "crazy" things like {{CURRENTTIME}} or {{time:...}} or {{int:...}} found within [[category:...]] and whose result depends on time or on user preferences would be easily avoided. This would also simplify a lot the management of whitespaces (if they need ot be trimmed and/or compressed depends on the builtin expansion function called by the parser. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug. _______________________________________________ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l