On Thu, Feb 21, 2008 at 03:43:54PM +1100, Steve Bennett wrote:
> On 2/21/08, Jay R. Ashworth <[EMAIL PROTECTED]> wrote:
> > I don't know if you remember it at this point, Steve, but one of the
> >  reasons I threw "won't someone *please* build us a grammar-driven
> >  parser" up in the air (and thanks, BTW :-), was precisely to get a
> >  fairly reliable count of how often each possible bit'o'grammer appears
> >  in, say, en.wp, so as to get a feeling for what will break if the
> >  syntax is restricted slightly...
> >
> >  That is to say that I concur with your instinct: 90/10 rule, I would
> >  guess, here.
> 
> Ah, yes.
> 
> Well, it really should be pretty easy to produce some stats like "in
> this corpus, there are 850 inline links, 560 images, 227 bullet list
> items" etc. It will be harder to detect subtle things like "things
> which closely resemble, but aren't, inline images" or "external links
> wrapped in double square brackets by some moron".

Oh sure.  Building the test harness will be an iterative process.  But
once someone does, we'll actually have not only a formal grammar, but a
second reference implementation...

Cheers,
-- jra
-- 
Jay R. Ashworth                   Baylink                      [EMAIL PROTECTED]
Designer                     The Things I Think                       RFC 2100
Ashworth & Associates     http://baylink.pitas.com                     '87 e24
St Petersburg FL USA      http://photo.imageinc.us             +1 727 647 1274

             Those who cast the vote decide nothing.
             Those who count the vote decide everything.
               -- (Joseph Stalin)


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to