On 2/21/08, Jay R. Ashworth <[EMAIL PROTECTED]> wrote:
> I don't know if you remember it at this point, Steve, but one of the
>  reasons I threw "won't someone *please* build us a grammar-driven
>  parser" up in the air (and thanks, BTW :-), was precisely to get a
>  fairly reliable count of how often each possible bit'o'grammer appears
>  in, say, en.wp, so as to get a feeling for what will break if the
>  syntax is restricted slightly...
>
>  That is to say that I concur with your instinct: 90/10 rule, I would
>  guess, here.

Ah, yes.

Well, it really should be pretty easy to produce some stats like "in
this corpus, there are 850 inline links, 560 images, 227 bullet list
items" etc. It will be harder to detect subtle things like "things
which closely resemble, but aren't, inline images" or "external links
wrapped in double square brackets by some moron".

Steve

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to