Hi Merlijn

Great, I will do so tonight. I have to say that I it is *not* attempt
to write a complete parser for wikitext but rather have a solution to
a some very limited problem which I encountered. This means that I can
find templates and parse them into key-value pairs and there is also
some code that can parse Image/File tags. However it is not a complete
parser and for example it does not parse headings as DrTrigon asked,
it is mostly doing templates at the moment. Also there is currently no
support for unnamed parameters.

However it might be a starting point for further work. I also did not
find formal specifications for wikitext so it was a lot of learning by
doing. However I used it successfully on ~4k "Infobox Chemie"
templates in the de-wiki.

Hannes

On 24 January 2012 09:55, Dr. Trigon <[email protected]> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hello Hannes
>
> Just wondering; is your text parser able to correctly find all headings
> (e.g. '== bla ==' as well as '<h2>bla</h2>') and distinguish headings
> from other similar text but within a paragraph? And finally return the
> byte offset of those headings?
>
> I am using such a piece of code written with help of difflib and it is
> may be useful here also? (even though I had not that much time to write
> a unittest with full coverage... but a simple one is there ;)
>
> Greetings
> DrTrigon
>
>
> On 23.01.2012 23:34, Hannes Röst wrote:
>> Hello all
>>
>>> From one of my assignments as a bot operator I have some code
>>> which
>> does template parsing and general text parsing (e.g. Image/File
>> tags). It is not using regex and thus able to correctly parse
>> nested templates and other such nasty things. I have written those
>> as library classes and written tests for them which cover almost
>> all of the code. I would now really like to contribute that code
>> back to the community.
>>
>> Would you be interested in adding this code to the pywikibot
>> framework? If yes, can I send the code to someone for code review
>> or how do you usually operate?
>>
>> Greetings
>>
>> Hannes
>>
>> PS: wiki userpage is
>> http://en.wikipedia.org/wiki/User:Hannes_R%C3%B6st
>>
>> _______________________________________________ Pywikipedia-l
>> mailing list [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.11 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
>
> iEYEARECAAYFAk8eceUACgkQAXWvBxzBrDBmJQCePmfUbs4Y8HNN18UT6vMFYo5r
> N1AAoLuN1VLpZQOrwegmkKWl08Te0Rxp
> =HXai
> -----END PGP SIGNATURE-----
>
> _______________________________________________
> Pywikipedia-l mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

_______________________________________________
Pywikipedia-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikipedia-l

Reply via email to