2010-09-27 20:58, Chad skrev:
> On Mon, Sep 27, 2010 at 1:42 PM, Aryeh Gregor
> <[email protected]> wrote:
>
>> On Mon, Sep 27, 2010 at 3:38 AM, Andreas Jonsson
>> <[email protected]> wrote:
>>
>>> Point me to one that has.
>>>
>> Maybe I'm wrong. I've never looked at them in depth. I don't mean to
>> be discouraging here. If you can replace the MediaWiki parser with
>> something sane, my hat is off to you. But if you don't receive a very
>> enthusiastic response from established developers, it's probably
>> because we've had various people trying to replace MediaWiki's parser
>> with a more conventional one since like 2003, and it's never produced
>> anything usable in practice. The prevailing sentiment is reflected
>> pretty well in Tim's commit summary from shortly before giving you
>> commit access:
>>
>> http://www.mediawiki.org/wiki/Special:Code/MediaWiki/71620
>>
>> Maybe we're just pessimistic, though. I'd be happy to be proven wrong!
>>
>>
> This. Tim sums up the consensus very well with that commit summary.
> He also made some comments on the history of wikitext and alternative
> parsers on foundation-l back in Jan '09[0]. Worth a read (starting mainly
> at ""Parser" is a convenient and short name for it").
>
> While a real parser is a nice pipe dream, in practice not a single project
> to "rewrite the parser" has succeeded in the years of people trying. Like
> Aryeh says, if you can pull it off and make it practical, hats off to you.
>
> -Chad
>
> [0] http://article.gmane.org/gmane.org.wikimedia.foundation/35876/
>
So, Tim are raising three objections against a more formalized parser:
1. Formal grammars are too restricted for wikitext.
My implementation represents a greater class of grammars than the
class of context free grammars. I believe that this gives
sufficient space for wikitext.
2. Previous parser implementation had performance issues.
I have not rigourusly tested the performance of my parser, but it
is linear to the size of the input complexity and seems to be
comparable to the original parser on plain text. Whith increasing
amount of markup, the original parser seems to degrade in
performance, while my implementation maintains a fairly constant
speed, regardless of input. It is possible to construct malicous
input that cause the performance my parser to be offset with a
constant (the same content scanned up to 13 times). But this is
not a situation that would occur on a normal page.
3. Some aspects of the existing parser follows well known parser
algorithms, but is better optimized. In particular, the
preprocessor.
My parser implementation does not preprocess the content. I
acknowledge that preprocessing is better done by the current
preprocessor. One just need to detangle the independent
preprocessing (parser functions, transclusion, magic words etc.)
from the parser preparation preprocessing (e.g., replacing <nowiki>
... </nowiki> with "magic" string).
Regarding optimization, it doesn't matter that the current parser
is "optimized" if my unoptimized implementation outperforms the
existing optimized one.
/Andreas
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l