On Thu, Feb 21, 2008 at 01:12:34PM +1100, Steve Bennett wrote: > On 2/21/08, Jay R. Ashworth <[EMAIL PROTECTED]> wrote: > > On Thu, Feb 21, 2008 at 01:16:22AM +1100, Steve Bennett wrote: > > > Time to take this grammar and do something with it. > > > > > > Build a parser with it, run it against the corpus, and see how often > > each individual rule pukes? > > Ok. I've actually done a bit of that, but I guess I should ramp up the > scale. It can be hard to detect pukage without actually generating > XHTML and comparing it, though. > > Generally, though, the answer is "not often". Flip through some random > wikitext. You'll find that a very small number of rules amount for the > vast majority of actual use. Though that may change once I have to > contend with the body of templates. People don't use tables much. They > don't use HTML tags or entities much. They almost never use magic > links (especially PMID - wtf is that about it). They almost never use > horizontal rules, HTML comments and rarely even extensions like <ref>
I don't know if you remember it at this point, Steve, but one of the reasons I threw "won't someone *please* build us a grammar-driven parser" up in the air (and thanks, BTW :-), was precisely to get a fairly reliable count of how often each possible bit'o'grammer appears in, say, en.wp, so as to get a feeling for what will break if the syntax is restricted slightly... That is to say that I concur with your instinct: 90/10 rule, I would guess, here. Cheers, -- jra -- Jay R. Ashworth Baylink [EMAIL PROTECTED] Designer The Things I Think RFC 2100 Ashworth & Associates http://baylink.pitas.com '87 e24 St Petersburg FL USA http://photo.imageinc.us +1 727 647 1274 Those who cast the vote decide nothing. Those who count the vote decide everything. -- (Joseph Stalin) _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
