Re: [Wikitext-l] On HTML elements in wikitext

Andreas Jonsson Wed, 18 Aug 2010 01:53:25 -0700

Hi Daniel,

2010-08-18 09:24, Daniel Kinzler skrev:
> Andreas Jonsson schrieb:
>    
>> Mixing HTML elements with wikitext is a grey area.  How the HTML tags
>> in the wikitext interact with the wikitext elements does not seem very
>> well defined.  Therefore, I will make up some rules
>> where I try to preserve any legitimate use of html elements, but with
>> some restrictions to avoid some problems:
>>
>> 1. Do not allow html block elements inside wikitext lists.  For examle
>>      this is no longer allowed:
>>
>>      * item1<li>  item2
>>      
> What does "not allowed" mean, exactly? What happens if the user enters this? 
> As
> by the old mantra, any text is valid wikitext.
>
>    
I mean that the character sequence "<li>" will not be a token in this 
context.  (It will become three tokens: SPECIAL[<], WORD[li] and 
SPECIAL[>], which should eventually be rendered as &lt;li&gt; by a html 
rendering client.)  Of course, the lexical scanner will accept any 
sequence of characters.


> So, I think it would make more sense to say that html block elements 
> *terminate*
> wikitext lists.
>
>    
That would be a reasonable alternative.  But I think that it is better 
to disable html block element tokens, because I don't think that it is a 
useful feature to make it possible to terminate a list item with 
anything but a newline or end of file.  I think that it would just more 
confusing for the users.

>> 2. Do not allow table html tags inside wikitext tables, unless opened
>>      up by a nested html table, which disables wikitext table tokens
>>      until the html table is properly closed:
>>      
> ....
>    
>>      So, we'll get two different kinds of table contexts, which may be
>>      arbitrarily nested, but not mixed.
>>      
> As long as arbitrary nesting is supported, I'm all for it! Mixing html and 
> wiki
> syntax for table elements leads to a mess with the current parser anyway.
>
>    
>> *<img [attributes]>  Same as<br>  except that it is enabled/disabled
>>     via a configuration option.
>>      
> Additional restrictions may be imposed on any attribute that contains a URL.
>
>    

At the moment I'm working on the lexer, which will attach a list of 
whatever looks like attributes to the corresponsing token.  Filtering 
the attribute list will be performed at a higher level.  As I understand 
it, the attribute list will never affect wether an opening tag should be 
treated as a token or not.  This is still a br tag:

<br complete *)()()(UF*(*garbage/>

>> *<p [attributes]>  Opening tag enables closing tag</p>  and disables
>>     itself until the end of the current inlined text.<p>  opens up a
>>     new paragraph,</p>  closes the current inlined text.
>>      
> Not sure<p>  should be disabled after<p>. Most browsers treat<p>...<p>  as
> <p>...</p><p>. That makes more sense, I think.
>
>    
That's MediaWiki's current behaviour.

<p> foo <p> foo </p>

is rendered as

<p> foo &lt;p&gt; foo </p>


>> * Inlined html elements.  These can be used for long term formatting.
>>     The context will make sure they are correctly nested, closed on end
>>     of inlined text and reopened at beginning of inlined text.  They are
>>     permanently closed at the corresponding end tag, or at end of
>>     article.  Variants:
>>      
> Do we really want inline formatting to span across blocks? I find that very
> quircky. I think the format should simply end at the end of the block, that's
> it. Interleaved markup is evil.
>
>    
That's MediaWiki's current behaviour.

>> * Block html elements.  Start and end tags terminate inline text.
>>     (They may _not_ be nested inside paragraphs.).
>>      
> That is: they *terminate* paragraphs.
>
>    
Inline text is not necessarily contained in a paragraph.

>> Inline text inside
>> <ol>  and<ul>  implies<li>, inlined text inside<dl>  implies<dd>,
>>      
> fine
>
>    
>>     inline text inside<div>  implies<p>,
>>      
> err. whot? no!<p>  usually implies margins/padding. if i use<div>foo</div>, i
> generally do not want any margins/padding!
>
>    

Sorry, sometimes I confuse html with DocBook, where all inline text must 
stand inside <para> tags.

>> inline text inside<table>
>>     implies<tbody><tr><td>,<h1>-<h6>  disables wikitext block element
>>     tokens, in addition to all html block element tokens except the
>>     correspondig closing</hX>  token.
>>      
> What exactly does "disable mean here? Do they get stripped? or displayed 
> verbatim?
>    

The corresponding tokens are disabled in the lexical scanner.

>> *<pre>  disables all html elements and all block elements (both wikitext
>>      and html block elements).
>>      
> <pre>  should disable *all* markup except</pre>. It's actually a lot 
> like<nowiki>.
>
> Lines starting with blanks (please include tabs here!), in contrast, become
> pre-formatted, but still allow inline formatting, auto-linking URLs, etc.
>
>    

Thanks, I had missed that.  I just assumed that they were equivalent.  
It seems that block html elements in an indentet line takes precedence:

     Preformatted text? <li> No!

Rendered as:

Preformatted text?
<li> No! </li>

That'll require an extra lookahead on all indented lines. *sigh*

>> *<ins>  and<del>  will be inline if occuring inside inlined text.
>>     Otherwise block.
>>
>> *<a>  disables wikitext link tokens.
>>      
> <a>  is not allowed at the moment. I once tried to add support for it, but got
> reverted for technical reasons. We might add it to support RFDa (semantic
> relations).
>    

You are right, it isn't.  Yippihe! :-)

>> * Tag extensions are treated like<nowiki>; the contents are passed
>>     verbatim to the corresponding callback function.  The parser may be
>>     called recursively if the extension needs to parse wikitext.
>>      
> Please note that the HTML returned from tag extensions is, at the moment, 
> *not*
> passed verbatim, though  it very likely should. See bug 1319, compare bug 
> 12974.
>
>    
I haven't analysed the tag extensions completely yet.  But I assume that 
the content isn't touched by the parser, and that if the extension wants 
anything inserted into the output stream, it must call the parser 
recursively.

> Thanks for your great work!
> -- daniel
>
> _______________________________________________
> Wikitext-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/wikitext-l
>
>    


_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] On HTML elements in wikitext

Reply via email to