2011-02-02 01:48, Karl Matthias skrev: > Apologies... even the second attempt was truncated it seems. Here's > one final try
You are hit by the same problem I was a few days ago on this list. You have a line that starts with "From your" in the text. /Andreas > Karl > ----------- > Alan Post wrote: > > Interesting. Is the PEG grammar available for this parser? > > > > > -Alan > > It's at https://github.com/AboutUs/kiwi/blob/master/src/syntax.leg > > Get peg/leg from http://piumarta.com/software/peg/ > > > I just tried it and already found a bug on the first Hello World (it > surrounds headers inside paragraphs). > It strangely converts templates into underscored words. They may be > expecting some other parser piece to restore it. I'm pretty sure there > > are corner cases in the preprocessor (eg. just looking at the peg file > they don't handle mixed case noincludes), but I don't think that should > need to be handled by the parser itself. > > The grammar looks elegant. I doubt it can really handle full wikitext. > > But it would be so nice if it did... > > > I'm one of the authors of the Kiwi parser and will be presenting it at > the Data Summit on Friday. The parser is pretty complete but > certainly we could use some community support and we encourage > feedback and participation! It is a highly functional tool already > but it can use some polish. It does actually handle most wikitext, > though not absolutely everything. > >>From your post I can see that you are experiencing a couple of design > decisions we made in writing this parser. We did not set out to match > the exact HTML output of MediaWiki, only to output something that will > look the same in the browser. This might not be the best approach, > but right now this is the case. Our site doesn't have the same needs > as Wikipedia so when in doubt we leaned toward what suited our needs > and not necessarily ultimate tolerance of poor syntax (though it is > somewhat flexible). Another design decision is that everything that > you put in comes out wrapped in paragraph tags. Usually this wraps > the whole document, so if your whole document was just a heading, then > yes it is wrapped in paragraph tags. This is probably not the best > way to handle this but it's what it currently does. Feel free to > contribute a different solution. > > Templates, as you probably know, require full integration with an > application to work in the way that MediaWiki handles them, because > they require access to the data store, and possibly other > configuration information. We built a parser that works independently > of the data store (indeed, even on the command line in a somewhat > degenerate form). In order to do that, we had to decouple template > retrieval from the parse. If you take a look in the Ruby FFI > examples, you will see a more elegant handling of templates(though it > needs work). When a document is parsed, the parser library makes > available a list of templates that were found, the arguments passed to > the template, and the unique replacement tag in the document for > inserting the template once rendered. Those underscored tags that come > out are not a bug, they are those unique tags. There is a switch to > disable templates and in that case it just swallows them instead. So > the template handling work flow (simplistically) is: > > 1. Parse original document and generate list of templates, > arguments, replacement tags > 2. Fetch first template, if there is no recursion needed, insert > into original document > 3. Fetch next template, etc > > We currently recurse 6 templates deep in the bindings we built for > AboutUs.org (sysop-only at the moment). Template arguments don't work > right now, but it's fairly trivial to do it. We just haven't done it > yet. > > Like templates, images require some different solutions if the parser > is to be decoupled. Our parser does not re-size images, store them, > etc. It just works with image URLs. If your application requires > images to be regularized, you would need to implement resizing them at > upload, or lazily at load time, or whatever works in your scenario. > More work is needed in this area, though if you check out > http://kiwi.drasticcode.com you can see that most image support is > working (no resizing). You can also experiment with the parser there > as needed. > > Hope that at least helps explain what we've done. Again, feedback and > particularly code contributions are appreciated! > > Cheers, > Karl > > _______________________________________________ > Wikitext-l mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/wikitext-l > _______________________________________________ Wikitext-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikitext-l
