> I've been spending hours on the parsing now and don't find it simple
> at all due to the fact that templates can be nested. Just extracting
> the Infobox as one big lump is hard due to the need to match nested {{
> and }}
>
> Andrew Dunbar (hippietrail)

Hi,

Come now, you are over-thinking it. Find "{{Infobox [Ll]anguage" in
the text, then count braces. Start at depth=2, count up and down 'till
you reach 0, and you are at the end of the template. (you can be picky
about only counting them if paired if you like ;-)

Then just regex match the lines/parameters you want.

However, if you are pulling the wikitext with the API, the XML parse
tree option sounds good; then you can just use elementTree (or the
like) and pull out the parameters directly

Robert

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to