Karl Matthias wrote: > I'm one of the authors of the Kiwi parser and will be presenting it at > the Data Summit on Friday. The parser is pretty complete but certainly > we could use some community support and we encourage feedback and > participation! It is a highly functional tool already but it can use > some polish. It does actually handle most wikitext, though not > absolutely everything. > > From your post I can see that you are experiencing a couple of design > decisions we made in writing this parser. We did not set out to match > the exact HTML output of MediaWiki, only to output something that will > look the same in the browser. This might not be the best approach, but > right now this is the case. Our site doesn't have the same needs as > Wikipedia so when in doubt we leaned toward what suited our needs and > not necessarily ultimate tolerance of poor syntax (though it is somewhat > flexible). I felt bad for pointing out issues just after first try. I understand that you have a much smaller content than wikipedia, and can use just a subset of the markup without about corner cases. I approach it as a tool which could work for the bigger parser, though. Currently, it looks as just another wiki syntax, looking similar to MediaWiki one.
> Another design decision is that everything that you put in > comes out wrapped in paragraph tags. Usually this wraps the whole > document, so if your whole document was just a heading, then yes it is > wrapped in paragraph tags. This is probably not the best way to handle > this but it's what it currently does. Feel free to contribute a > different solution. It doesn't seem to be legal html*, so I wouldn't justify it just as a "design decision". Same could be argued for nested <p> tags. * opening the <hX> seems to implicitely close the previous <p>, leading to an unmatched </p>. > Templates, as you probably know, require full integration with an > application to work in the way that MediaWiki handles them, because they > require access to the data store, and possibly other configuration > information. We built a parser that works independently of the data > store (indeed, even on the command line in a somewhat degenerate form). > In order to do that, we had to decouple template retrieval from the > parse. If you take a look in the Ruby FFI examples, you will see a more > elegant handling of templates(though it needs work). When a document is > parsed, the parser library makes available a list of templates that were > found, the arguments passed to the template, and the unique replacement > tag in the document for inserting the template once rendered. Those > underscored tags that come out are not a bug, they are those unique > tags. I supposed that it was somehting like that, but it was odd that it did such conversion instead of leaving them as literals in such case. I used just the parser binary. I have been looking at the ruby code, and despite of the foreign language, understanding a bit more of its work. > Like templates, images require some different solutions if the parser is > to be decoupled. Our parser does not re-size images, store them, etc. > It just works with image URLs. If your application requires images to > be regularized, you would need to implement resizing them at upload, or > lazily at load time, or whatever works in your scenario. A parser shouldn't really need to handle images. At most it would provide a callback so that the app could do something with the image urls. > More work is > needed in this area, though if you check out http://kiwi.drasticcode.com > you can see that most image support is working (no resizing). You can > also experiment with the parser there as needed. The url mapping used there, make some titles impossible to use, such as making an entry for [[Edit]] - http://en.wikipedia.org/wiki/Edit > Hope that at least helps explain what we've done. Again, feedback and > particularly code contributions are appreciated! > > Cheers, > Karl Just code lurking for now :) _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
