matej_suchanek created this task.
matej_suchanek added projects: Pywikibot, Pywikibot-Wikidata, Regression,
Performance.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper.
TASK DESCRIPTION
Run this code:
>>> import pywikibot
>>> repo = pywikibot.Site('wikidata', 'wikidata')
>>> item = pywikibot.ItemPage(repo, 'Q16503')
>>> data = item.get()
The last line will take many seconds while the respective API
<https://www.wikidata.org/w/api.php?action=wbgetentities&format=json&ids=Q16503&redirects=yes&props=info%7Csitelinks%7Caliases%7Clabels%7Cdescriptions%7Cclaims%7Cdatatype>
call takes a while. The reason is that during this operation all sitelinks are
initialized AND (some of them) parsed in `SiteLink._parse_namespace` which a)
creates a new site object via `APISite.fromDBName` (not a cached one as
`pywikibot.Site` would do), b) does an API call for each site to get the
namespace information (this can be very slow for many sites). Note that
combination of both caused my bot to crash on `MemoryError`, with trace to
these methods.
This all is quite unexpected for bot operators who don't care about sitelinks
(or who do but not about what namespace they link to). Some lazy initialization
should be introduced, probably in all fromDBName, SiteLink and ItemPage.
TASK DETAIL
https://phabricator.wikimedia.org/T226157
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: matej_suchanek
Cc: Lokal_Profil, Aklapper, matej_suchanek, pywikibot-bugs-list, Viztor,
DannyS712, Wenyi, Darkminds3113, Jayprakash12345, Tbscho, MayS, Vali.matei,
Mdupont, JJMC89, Dvorapa, Altostratus, Avicennasis, Volker_E, Wong128hk,
mys_721tx, GWicke, Dinoguy1000, jayvdb, Ricordisamoa, Masti, Alchimista, Rxy,
Jay8g
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs