Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Andreas Jonsson Tue, 28 Sep 2010 00:42:06 -0700

2010-09-27 20:58, Chad skrev:
> On Mon, Sep 27, 2010 at 1:42 PM, Aryeh Gregor
> <[email protected]>  wrote:
>    
>> On Mon, Sep 27, 2010 at 3:38 AM, Andreas Jonsson
>> <[email protected]>  wrote:
>>      
>>> Point me to one that has.
>>>        
>> Maybe I'm wrong.  I've never looked at them in depth.  I don't mean to
>> be discouraging here.  If you can replace the MediaWiki parser with
>> something sane, my hat is off to you.  But if you don't receive a very
>> enthusiastic response from established developers, it's probably
>> because we've had various people trying to replace MediaWiki's parser
>> with a more conventional one since like 2003, and it's never produced
>> anything usable in practice.  The prevailing sentiment is reflected
>> pretty well in Tim's commit summary from shortly before giving you
>> commit access:
>>
>> http://www.mediawiki.org/wiki/Special:Code/MediaWiki/71620
>>
>> Maybe we're just pessimistic, though.  I'd be happy to be proven wrong!
>>
>>      
> This. Tim sums up the consensus very well with that commit summary.
> He also made some comments on the history of wikitext and alternative
> parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly
> at ""Parser" is a convenient and short name for it").
>
> While a real parser is a nice pipe dream, in practice not a single project
> to "rewrite the parser" has succeeded in the years of people  trying. Like
> Aryeh says, if you can pull it off and make it practical, hats off to you.
>
> -Chad
>
> [0] http://article.gmane.org/gmane.org.wikimedia.foundation/35876/
>    
So, Tim are raising three objections against a more formalized parser:


1. Formal grammars are too restricted for wikitext.

    My implementation represents a greater class of grammars than the
    class of context free grammars.  I believe that this gives
    sufficient space for wikitext.

2. Previous parser implementation had performance issues.

    I have not rigourusly tested the performance of my parser, but it
    is linear to the size of the input complexity and seems to be
    comparable to the original parser on plain text.  Whith increasing
    amount of markup, the original parser seems to degrade in
    performance, while my implementation maintains a fairly constant
    speed, regardless of input.  It is possible to construct malicous
    input that cause the performance my parser to be offset with a
    constant (the same content scanned up to 13 times).  But this is
    not a situation that would occur on a normal page.

3. Some aspects of the existing parser follows well known parser
    algorithms, but is better optimized.  In particular, the
    preprocessor.

    My parser implementation does not preprocess the content.  I
    acknowledge that preprocessing is better done by the current
    preprocessor.  One just need to detangle the independent
    preprocessing (parser functions, transclusion, magic words etc.)
    from the parser preparation preprocessing (e.g., replacing <nowiki>
    ... </nowiki> with "magic" string).

    Regarding optimization, it doesn't matter that the current parser
    is "optimized" if my unoptimized implementation outperforms the
    existing optimized one.


/Andreas

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] Parser implementaton for MediaWiki syntax

Reply via email to