Re: [Wikitext-l] New parser: Kiwi

Karl Matthias Wed, 02 Feb 2011 20:22:05 -0800

On Wed, Feb 2, 2011 at 3:08 PM, Platonides <[email protected]> wrote:
> I approach it as a tool which could work for the bigger parser, though.
> Currently, it looks as just another wiki syntax, looking similar to
> MediaWiki one.


I think it is a tool that shows promise in that regard as well.  With
regard to "just another syntax": we can probably support all or at
least most of the most important edge cases using this methodology.
It will make it much uglier, but it probably can work.  The question
is to what lengths you go to support poorly formed markup.  That
answer will probably be different based on the accumulated content at
various sites.  Our parser isn't too tolerant right now of poorly
formed markup.  On our site that's ok.  If people want to help us make
it more tolerant we'd be interested in seeing how that turns out.  I
suspect it could at least double the size of the grammar based on what
Ward tells me that Dirk Riehle's group found with WikiCreole.  But a
community effort could probably make it doable.

> It doesn't seem to be legal html*, so I wouldn't justify it just as a
> "design decision". Same could be argued for nested <p> tags.

It's not 100% legal right now and the most egregious spot is the
paragraph tags.  It can be modified but doing it this way got it off
the ground faster.  Hence it was a design decision.  But we probably
will modify it to behave better in that regard.  If someone  wants to
contribute the changes to do it, that will make it happen much faster
as it's low on the list right now.  Fork it on GitHub and go for it!
Make the changes and submit a pull request and we'll review it.  Note
that MediaWiki doesn't generate 100% valid markup (but it's cleaner
than ours right now!).

> * opening the <hX> seems to implicitely close the previous <p>, leading
> to an unmatched </p>.

I hadn't noticed this, I'll check that out.  Thanks!

> Templates [...snip...]
> I supposed that it was somehting like that, but it was odd that it did
> such conversion instead of leaving them as literals in such case.
> I used just the parser binary. I have been looking at the ruby code, and
> despite of the foreign language, understanding a bit more of its work.

The replacement with the hashed tag is done so that we can use a
simple context-unaware string replacement on the output.  If we left
them in the original form we would have to know the difference between
a template call inside noinclude tags and one that isn't--at render
time when we have no state on the document.  Given that the help info
for many templates show exact calls to the template placed within
noinclude tags, this would be a common bug.  It's not the only
possible solution but it's a simple one.

>> Like templates, images require some different solutions if the parser is
>> to be decoupled.[...snip...]
>
> A parser shouldn't really need to handle images. At most it would
> provide a callback so that the app could do something with the image urls.

We don't do callbacks on purpose so that we can separate the parser
completely from the calling code.  Our design would put the
information in a place where a calling application can get to it (e.g.
the list of Templates).  But consider that MediaWiki actually does
handle images and adds markup for height and width, etc.  It makes
database calls to determine "bad images", etc.  This is something a
separate parser can't do in the same way.  A mechanism needs to be put
in place to allow the calling application to do this work if it so
chooses. It's fairly straightforward to do it.

>> More work is
>> needed in this area, though if you check out http://kiwi.drasticcode.com
>> you can see that most image support is working (no resizing).  You can
>> also experiment with the parser there as needed.
>
> The url mapping used there, make some titles impossible to use, such as
> making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit

You are right about that.  I'm sure Sam would be happy to accept
contributions to change that.  The site does support double-click to
edit, though, so making links to Edit is kind of unnecessary.

>
> Just code lurking for now :)

No worries, the feedback is appreciated.

Cheers,
Karl

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Re: [Wikitext-l] New parser: Kiwi

Reply via email to