On Thu, Feb 21, 2008 at 03:43:54PM +1100, Steve Bennett wrote: > On 2/21/08, Jay R. Ashworth <[EMAIL PROTECTED]> wrote: > > I don't know if you remember it at this point, Steve, but one of the > > reasons I threw "won't someone *please* build us a grammar-driven > > parser" up in the air (and thanks, BTW :-), was precisely to get a > > fairly reliable count of how often each possible bit'o'grammer appears > > in, say, en.wp, so as to get a feeling for what will break if the > > syntax is restricted slightly... > > > > That is to say that I concur with your instinct: 90/10 rule, I would > > guess, here. > > Ah, yes. > > Well, it really should be pretty easy to produce some stats like "in > this corpus, there are 850 inline links, 560 images, 227 bullet list > items" etc. It will be harder to detect subtle things like "things > which closely resemble, but aren't, inline images" or "external links > wrapped in double square brackets by some moron".
Oh sure. Building the test harness will be an iterative process. But once someone does, we'll actually have not only a formal grammar, but a second reference implementation... Cheers, -- jra -- Jay R. Ashworth Baylink [EMAIL PROTECTED] Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com '87 e24 St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 Those who cast the vote decide nothing. Those who count the vote decide everything. -- (Joseph Stalin) _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
