On Sun, Dec 13, 2009 at 6:19 PM, Tim Landscheidt <[email protected]> 
wrote:
> Hi,
>
> I'm currently maintaining wikilint (cf.
> <URI:http://toolserver.org/~timl/cgi-bin/wikilint>) that re-
> views Wikipedia articles for common problems. At the moment,
> it is a powerful, but ugly mess of regular expressions ga-
> lore. Fixing bugs is a nightmare.
>
>  Ideally, a redesign would parse the source in a tree-like
> structure and then work on that. So I went to CPAN and
> [[mw:Alternative parsers]] and found out that:
>
> a) there are lots of "release early, release once" "imple-
>   mentations" that do not anything useful and do not seem
>   to be in further development, and
> b) for many people, "parser" seems to have the meaning
>   "converter".
>
> So I'll probably have to start another try. As for wikilint
> I do not have to be able to parse 100 % of all thinkable wi-
> ki markup (if the article cannot be parsed, it probably is
> broken anyway), I could go for a rather "lean" approach. For
> the tree structure, I would opt for DOM to maximize code re-
> usability with wiki markup in a separate namespace. If there
> are no relevant fundaments to build on, I would prefer Perl,
> ideally enhancing an existing CPAN module like
> WWW::Wikipedia::Entry.
>
>  Any pointers to things that I overlooked? Thoughts on in-
> terfaces & Co.? Volunteers? :-)

This falls more into the "converter" group, but
http://toolserver.org/~magnus/wiki2xml/w2x.php
generates pretty usable XML output, especially when you use the option
for API template resolution. You can use the source directly on
command line, or just query the tool via GET or POST.

Cheers,
Magnus

_______________________________________________
Wikitext-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitext-l

Reply via email to