hi, thank you. Where can I find documentation for an example to extract links https://github.com/earwig/mwparserfromhell or https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py ?
I'd be very grateful if you can point me to an example for links extraction and redirect. Shall I use them against the xml dump or as bot to api.wikimedia? I would like to use offline, but mwparserfromhell seems to use online against api.wikipedia.. where are documentation of scripts in mediawiki.org? https://www.mediawiki.org/w/index.php?search=xmlparser&title=Special%3ASearch&go=Go thank you! On Mon, Jan 18, 2016 at 8:05 PM, Morten Wang <[email protected]> wrote: > An alternative is Aaron Halfaker's mediawiki-utilities ( > https://pypi.python.org/pypi/mediawiki-utilities) and mwparserfromhell ( > https://github.com/earwig/mwparserfromhell) to parse the wikitext to > extract the links, the latter is already a part of pywikibot, though. > > > Cheers, > Morten > > > On 18 January 2016 at 10:45, Amir Ladsgroup <[email protected]> wrote: > >> Hey, >> There is a really good module implemented in pywikibot called >> xmlreader.py >> <https://github.com/wikimedia/pywikibot-core/blob/master/pywikibot/xmlreader.py>. >> Also a help is built based on the source code >> <https://doc.wikimedia.org/pywikibot/api_ref/pywikibot.html#module-pywikibot.xmlreader> >> You can read the source code and write your own script. Some scripts also >> support xmlreader, read the manual for them in mediawiki.org >> >> Best >> >> On Mon, Jan 18, 2016 at 10:00 PM Luigi Assom <[email protected]> >> wrote: >> >>> hello hello! >>> about the use of pywikibot: >>> is it possible to use to parse the xml dump? >>> >>> I am interested in extracting links from pages (internal, external, with >>> distinction from ones belonging to category). >>> I also would like to handle transitive redirect. >>> I would like to process the dump, without accessing wiki, either access >>> wiki with proper limits in butch. >>> >>> Is there maybe something in the package already taking care of this ? >>> I 've seen in https://www.mediawiki.org/wiki/Manual:Pywikibot/Scripts >>> there is a "ghost" extracting_links.py" script, >>> I wonted to ask before re-inventing the wheel, and if pywikibot is >>> suitable tool for the purpose. >>> >>> Thank you, >>> L. >>> _______________________________________________ >>> pywikibot mailing list >>> [email protected] >>> https://lists.wikimedia.org/mailman/listinfo/pywikibot >>> >> >> _______________________________________________ >> pywikibot mailing list >> [email protected] >> https://lists.wikimedia.org/mailman/listinfo/pywikibot >> >> > > _______________________________________________ > pywikibot mailing list > [email protected] > https://lists.wikimedia.org/mailman/listinfo/pywikibot > >
_______________________________________________ pywikibot mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/pywikibot
