On 28/03/10 18:59, Aryeh Gregor wrote: > On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang<[email protected]> wrote: > >> (You also as a Mediawiki extension rather than a core feature; I'm going >> to do that, but I won't say anything more because it seems fairly >> uncontroversial.) >> > I actually disagree with this pretty strongly. It would be a > regression in functionality for existing users -- if they upgrade, > their wiki breaks unless they install a new extension. There's no > reason to remove it from core that I see that outweighs this > disadvantage. > > >> Since the subset of TeX you need parsed has a context-free grammar, it >> needs an LALR parser, not just a bunch of regexes. I know three ways to >> get an LALR parser: >> >> (1) write a pushdown automaton manually (i.e., be yacc) >> (2) write input for a parser-generator >> (3) write a parser-generator, and give it input >> >> Option (2) is the most maintainable and feasible option, and it's >> precisely the one that cannot be done in PHP. As far as I know, PHP has >> no parser-generator package. (Please, please let me know if that's >> incorrect so I can stop embarrassing myself and get on with writing a >> GSoC proposal.) >> >> I could probably do (1), or some hackish kludge at half of it, by >> throwing custom control structures into a bucketload of regexes, but I >> don't think that's in the project's best interests. As has been pointed >> out, the OCaml implementation is really concise and elegant. A large >> fraction of that concision and elegance comes from not actually being a >> parser but rather only a context-free grammar written in a BNF-like >> syntax common to most parser-generators. >> > Okay, well, maybe you're right. I'd be interested to hear Tim > Starling's opinion on this (using parser generators vs. writing by > hand). Writing it in Python would certainly be a big step forward > from OCaml -- any site with LaTeX accessible to MediaWiki will almost > certainly have Python available, so Python vs. PHP should make no > difference to end-users. And Python is probably the second-best-known > language among MediaWiki hackers. >
Have you had a look at pyparsing, which is a ready-made all-singing-all-dancing Python parser package with a large amount of syntactic sugar built in to allow the more-or-less direct input of grammar notations? Given that the texvc source already has a grammar encoded into it in machine-executable form, it might be an idea to consider mechanically extract that grammar from the texvc OCaml source, and then reformatting it into a grammar in pyparsing's natural format. -- Neil _______________________________________________ Wikitech-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitech-l
