On 28/03/10 18:59, Aryeh Gregor wrote:
> On Fri, Mar 26, 2010 at 10:48 PM, Damon Wang<[email protected]>  wrote:
>    
>> (You also as a Mediawiki extension rather than a core feature; I'm going
>> to do that, but I won't say anything more because it seems fairly
>> uncontroversial.)
>>      
> I actually disagree with this pretty strongly.  It would be a
> regression in functionality for existing users -- if they upgrade,
> their wiki breaks unless they install a new extension.  There's no
> reason to remove it from core that I see that outweighs this
> disadvantage.
>
>    
>> Since the subset of TeX you need parsed has a context-free grammar, it
>> needs an LALR parser, not just a bunch of regexes. I know three ways to
>> get an LALR parser:
>>
>>     (1) write a pushdown automaton manually (i.e., be yacc)
>>     (2) write input for a parser-generator
>>     (3) write a parser-generator, and give it input
>>
>> Option (2) is the most maintainable and feasible option, and it's
>> precisely the one that cannot be done in PHP. As far as I know, PHP has
>> no parser-generator package. (Please, please let me know if that's
>> incorrect so I can stop embarrassing myself and get on with writing a
>> GSoC proposal.)
>>
>> I could probably do (1), or some hackish kludge at half of it, by
>> throwing custom control structures into a bucketload of regexes, but I
>> don't think that's in the project's best interests. As has been pointed
>> out, the OCaml implementation is really concise and elegant. A large
>> fraction of that concision and elegance comes from not actually being a
>> parser but rather only a context-free grammar written in a BNF-like
>> syntax common to most parser-generators.
>>      
> Okay, well, maybe you're right.  I'd be interested to hear Tim
> Starling's opinion on this (using parser generators vs. writing by
> hand).  Writing it in Python would certainly be a big step forward
> from OCaml -- any site with LaTeX accessible to MediaWiki will almost
> certainly have Python available, so Python vs. PHP should make no
> difference to end-users.  And Python is probably the second-best-known
> language among MediaWiki hackers.
>    

Have you had a look at pyparsing, which is a ready-made 
all-singing-all-dancing Python parser package with a large amount of 
syntactic sugar built in to allow the more-or-less direct input of 
grammar notations?

Given that the texvc source already has a grammar encoded into it in 
machine-executable form, it might be an idea to consider mechanically 
extract that grammar from the texvc OCaml source, and then reformatting 
it into a grammar in pyparsing's natural format.

-- Neil


_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to