Carlb created this task.
Carlb added projects: Pywikibot, Pywikibot-interwiki.py.
Restricted Application added subscribers: pywikibot-bugs-list, revi, Aklapper.

TASK DESCRIPTION
  This should probably be tagged as "Pywikibot-interwikidata.py" but there 
doesn't seem to be an available tag for that item.
  
  The Wikibase extension and the Pywikibot-interwikidata.py script both contain 
strict hard-coded assumptions which, while likely valid on WMF wikis, may break 
on third-party wikis:
  
  - T172076 <https://phabricator.wikimedia.org/T172076>: The code assumes that 
the GlobalID naming convention will be (language code)+(group name) with any 
hyphens replaced with underscores, It also hard-codes an assumption that the 
(group name) will always be "wiki*" or "wiktionary" (as WMF project names)  and 
removing that trailing group name will yield the local language code.
  - T221550 <https://phabricator.wikimedia.org/T221550> :  The API and core 
code assume the local database name (wikiID) can be reported to API clients as 
a presumed-standard GlobalID which is consistent in format, unique across that 
entire project and follows all naming conventions. (This won't be fixed at the 
API level until GlobalID exists in core MW code and, even then, good luck 
getting externally-hosted projects to update their configs.)
  - T221556 <https://phabricator.wikimedia.org/T221556> : Furthermore, 
interwikidata.py assumes there are no individual language wikis in the group 
which are independently hosted (or which lack access to the common repository). 
The script takes a list of interwikis from the article, makes an API query for 
each to see if it's already linked to an item, so that it may treats anything 
linked to some other Wikibase Q-item as a conflict. Unfortunately, if the API 
responds that there is no Wikibase at all behind one language's site, the 
script does not even attempt to handle this condition and immediately exits - 
when the proper behaviour should be to treat a "We don't have a lord. We're an 
autonomous collective." response as there being no conflicting Q-item link on 
the remote wiki (so OK, no error).
  
  Even if these issues are fixed locally, one problem remains: any 
externally-hosted wikis will be returning their local database name as WikiID - 
and that won't match the GlobalID.
  
  That's happening because interwikidata.py presumes the API is providing a 
GlobalID while the API presumes there is no GlobalID support in core and 
returns the local database name. That's a design flaw; there are workarounds in 
other places (such as 
$wgWBRepoSettings['localClientDatabases']['ptuncyc']='uncyc_pt'; in the 
Wikibase-repo extension config) but there's no table to map the local database 
names to the API WikiID to the pywikibot/site.py (which is blindly expecting 
the WikiID to actually be the GlobalID, always).
  
  Steps to Reproduce:
  
  Install and try to run Pywikibot-interwiki.py on Uncyclopedia. (This will 
require patching code to address T221556 
<https://phabricator.wikimedia.org/T221556> first, which I shall not address 
here, and the "home wiki" for the bot will need to be set to one of the 
languages which has access to the repository.)
  
  There's a (somewhat-broken) Wikidata repository on *.uncyclopedia.info but 
the project is a mess of independently-hosted languages (such as Russian, 
Polish, Korean), items on external wiki farms (Italian is on Miraheze?)  and 
entire clusters of wikis (*.uncyclopedia.co) which are separate from anything 
on the repo.
  
  In theory, the Wikibase extension code should be capable of creating an 
outbound inter-language link to an externally-hosted project if its page and 
API links are in the `sites` table. In practice, everything still goes haywire 
even after the other bugs listed above have been patched (or kludged, or worked 
around...) as the wikiID being reported by the individual external projects 
seems to vary widely, depending on who is hosting each individual language.
  
  Actual Results:
  
  Every time a link to the externally-hosted site is found, if the site's 
API-reported database name doesn't match the expected GlobalID, the script will 
report "Unknown site:" and the database name reported by the remote API. This 
prevents the script from creating outbound interlanguage links to that specific 
externally-hosted site.
  
  Expected Results:
  
  The only easy way to get the desired result (the script can make 
outbound-only links to externally-hosted languages, even if that doesn't 
generate a backlink from the external site) is to add a translation table to be 
consulted in pywikibot/site.py - something like:
  
    def dbName(self):
        """Return this site's internal id."""
        wikiIDmap = {
        'uncy_cs': 'csuncyc',
        'uncy_de': 'deuncyc',
        'uncy_en': 'enuncyc',
        'uncy_es': 'esuncyc',
        'uncy_fr': 'fruncyc',
        'uncy_he': 'heuncyc',
        'uncy_un': 'en_gbuncyc',
        'engbuncyc': 'en_gbuncyc',
        'zhtwuncyc': 'zh_twuncyc',
        'beidipediawiki': 'aruncyc',
        'nonciclopediawiki': 'ituncyc',
        'uncyclopediawiki': 'zh_cnuncyc',
        'uncyclo_pedia': 'kouncyc',
        'nonsensopedia': 'pluncyc',
        'absurd': 'ruuncyc'
        }
        return wikiIDmap.get(self.siteinfo['wikiid'], self.siteinfo['wikiid'])
  
  instead of the original (pywikibot/site.py lines 2727-2729
  
    def dbName(self):
        """Return this site's internal id."""
        return self.siteinfo['wikiid']
  
  This is a kludge. Ultimately, the wikiIDmap needs to exist as part of the 
configuration file, perhaps user-config.py or user-added to the generated 
uncyclopedia-family.py file.
  
  The current code is relying on the API to be returning GlobalID and the 
GlobalID concept (per T221550 <https://phabricator.wikimedia.org/T221550>) 
simply doesn't exist in the API because it doesn't exist in core code. WMF is a 
closed, controlled environment where the local database names follow one, 
specific known pattern that matches the GlobalID. A third-party external site? 
Don't count on anything.

TASK DETAIL
  https://phabricator.wikimedia.org/T222021

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Carlb
Cc: Aklapper, revi, pywikibot-bugs-list, Carlb, DannyS712, Wenyi, Tbscho, MayS, 
Mdupont, JJMC89, Avicennasis, Thibaut120094, mys_721tx, jayvdb, Dalba, Masti, 
Alchimista, Rxy
_______________________________________________
pywikibot-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to