2010-08-10 12:45, Thomas Dalton skrev:
> On 10 August 2010 11:09, David Gerard<[email protected]>  wrote:
>    
>> On 9 August 2010 17:04, Mark Clements (HappyDog)<[email protected]>  
>> wrote:
>>
>>      
>>> This kind of unexpected edge-case is arguably something that should be fixed
>>> in any formal markup specification.
>>>        
>>
>> How prevalent is it in actual wikitext? Is it an edge case people
>> actually use much, or are all instances of it basically errors?
>> That'll be the question.
>>      
> Its only potential use is in making the wikitext more easily readable,
> which doesn't seem important enough to warrant just a weird edge-case.
> Any formal spec is going to end up breaking things, that can't really
> be helped (unless we just write down a spec for the current behaviour,
> bugs and all, which sounds like a lost opportunity to me).
>
>    

If you consider the large body of information tied to MediaWiki
syntax, it is likely that for any border case, there is a revision of
some page that will trigger that border case.

Regarding strategy on how to replace the MediaWiki parser, I can
see two extremes:

1. Search out all wierd edge cases and reproduce them in parser rules.
    Walk through the revisions of Wikipedia and for each edge case, note
    all revisions for which the parser rule for the edge case is
    executed.  Based on the data determine which edge cases can be
    safely removed.  Or define a conversion for the content.

2. Don't support any edge cases.  Just consider the content broken and
    let the wiki users themself fix it.  Historic revisions of pages
    will be permanently broken.

I am trying to support as many edge cases as far as reasonable in
my attemt to write a new parser.  I seems, however, as if the parser
is actively developed, and backwards compliancy with edge cases maybe
isn't much of a concern.  For instance, in 1.16.0beta3 we have:

         $text = $this->doAllQuotes( $text );
         $text = $this->replaceInternalLinks( $text );
         $text = $this->replaceExternalLinks( $text );

which in trunk is:

         $text = $this->replaceInternalLinks( $text );
         $text = $this->doAllQuotes( $text );
         $text = $this->replaceExternalLinks( $text );

So, it is now possible to have apostrophes in internal links, but
still not in external.

 From the parser's point of view, the edge cases can be divided into
"harmless", where a rule to support it does not increase the
complexity of the parser significantly, and "harmful", where adding a
rule to support them would either dramatically increase the size of
the parser or make it possible to craft contents that will take more
than linear time or memory to process.  The edge cases surrounding
links definitely fall into the harmful category.  I will be writing a
separate post about links later.

Maybe it would be a good idea to provide som feedback to the user
regarding bad syntax.  In my parser implementation, I am considering
generating special events for syntax that should be avoided.  For
instance:

begin_table:
     begin = BEGIN_TABLE NEWLINE*
     (
         {
            X->beginGarbageBlock(X, "Unsupported syntax: content between 
the {| and the first column in a table.");
         }
         ((inline_element)=> garbage_inline_text NEWLINE* )*
         block_elements?
         {
            X->endGarbageBlock(X);
         }
     )*
     {
        X->beginTable(X, $begin->custom);
     }
     ;

Could for instance be rendered in html as: <div class="garbage"
title="Unsupported syntax: content between ..."> </div>.

/Andreas


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to