hi, thank you.

Where can I find documentation for an example to extract links
https://github.com/earwig/mwparserfromhell
or
https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py
?

I'd be very grateful if you can point me to an example for links extraction
and redirect.
Shall I use them against the xml dump or as bot to api.wikimedia?
I would like to use offline, but mwparserfromhell seems to use online
against api.wikipedia..

where are documentation of scripts in mediawiki.org?
https://www.mediawiki.org/w/index.php?search=xmlparser&title=Special%3ASearch&go=Go

thank you!



On Mon, Jan 18, 2016 at 8:05 PM, Morten Wang <[email protected]> wrote:

> An alternative is Aaron Halfaker's mediawiki-utilities (
> https://pypi.python.org/pypi/mediawiki-utilities) and mwparserfromhell (
> https://github.com/earwig/mwparserfromhell) to parse the wikitext to
> extract the links, the latter is already a part of pywikibot, though.
>
>
> Cheers,
> Morten
>
>
> On 18 January 2016 at 10:45, Amir Ladsgroup <[email protected]> wrote:
>
>> Hey,
>> There is a really good module implemented in pywikibot called
>> xmlreader.py
>> <https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>.
>> Also a help is built based on the source code
>> <https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader>
>> You can read the source code and write your own script. Some scripts also
>> support xmlreader, read the manual for them in mediawiki.org
>>
>> Best
>>
>> On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom <[email protected]>
>> wrote:
>>
>>> hello hello!
>>> about the use of pywikibot:
>>> is it possible to use to parse the xml dump?
>>>
>>> I am interested in extracting links from pages (internal, external, with
>>> distinction from ones belonging to category).
>>> I also would like to handle transitive redirect.
>>> I would like to process the dump, without accessing wiki, either access
>>> wiki with proper limits in butch.
>>>
>>> Is there maybe something in the package already taking care of this ?
>>> I 've seen in https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts
>>> there is a "ghost" extracting_links.py" script,
>>> I wonted to ask before re-inventing the wheel, and if pywikibot is
>>> suitable tool for the purpose.
>>>
>>> Thank you,
>>> L.
>>> _______________________________________________
>>> pywikibot mailing list
>>> [email protected]
>>> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>>>
>>
>> _______________________________________________
>> pywikibot mailing list
>> [email protected]
>> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>>
>>
>
> _______________________________________________
> pywikibot mailing list
> [email protected]
> https://lists.wikimedia.org/mailman/listinfo/pywikibot
>
>
_______________________________________________
pywikibot mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot

Reply via email to