[Wikidata-bugs] [Maniphest] T286861: scripts/interwikidata.py returns a InconsistentTitleError on a Malayalam title

2021-07-18 Thread JJMC89
JJMC89 removed a project: Wikidata.

TASK DETAIL
  https://phabricator.wikimedia.org/T286861

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: JJMC89
Cc: Xqt, santhosh, Urbanecm, Aklapper, Amire80, pywikibot-bugs-list, Jyoo1011, 
JohnsonLee01, SHEKH, Dijkstra, Khutuck, Zkhalido, Viztor, Wenyi, Tbscho, MayS, 
Mdupont, JJMC89, Dvorapa, Altostratus, Avicennasis, mys_721tx, jayvdb, Masti, 
Alchimista, Invadibot, maantietaja, Akuckartz, Nandana, Lahi, Gq86, 
GoranSMilovanovic, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, 
Wikidata-bugs, aude, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T286861: scripts/interwikidata.py returns a InconsistentTitleError on a Malayalam title

2021-07-18 Thread Amire80
Amire80 added a comment.


  Aha, so as I suspected, it probably is a Malayalam encoding issue. There was 
a change in how Malayalam final letters are encoded about eleven years ago. I'd 
expect this not to throw an error, but to be automatically compatible in 
MediaWiki's Unicode handling, but I'm not a true expert in the details of how 
that works.

TASK DETAIL
  https://phabricator.wikimedia.org/T286861

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Amire80
Cc: Xqt, santhosh, Urbanecm, Aklapper, Amire80, pywikibot-bugs-list, Invadibot, 
Jyoo1011, maantietaja, JohnsonLee01, SHEKH, Dijkstra, Khutuck, Akuckartz, 
Zkhalido, Viztor, Nandana, Wenyi, Lahi, Gq86, GoranSMilovanovic, QZanden, 
Tbscho, MayS, LawExplorer, Mdupont, JJMC89, Dvorapa, _jensen, rosalieper, 
Altostratus, Avicennasis, Scott_WUaS, mys_721tx, Wikidata-bugs, aude, jayvdb, 
Masti, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T286861: scripts/interwikidata.py returns a InconsistentTitleError on a Malayalam title

2021-07-18 Thread Xqt
Xqt added a comment.


  The warning is an MediaWiki-API issue.
  
  The page title on left contains 3 chars for `'ന്‍'` which is `'ന'` + `'്'` +` 
'\u200d'` whereas on right side `'ൻ'` is `chr(3451)`. There must be a wrong 
title given somewhere.

TASK DETAIL
  https://phabricator.wikimedia.org/T286861

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Xqt
Cc: Xqt, santhosh, Urbanecm, Aklapper, Amire80, pywikibot-bugs-list, Invadibot, 
Jyoo1011, maantietaja, JohnsonLee01, SHEKH, Dijkstra, Khutuck, Akuckartz, 
Zkhalido, Viztor, Nandana, Wenyi, Lahi, Gq86, GoranSMilovanovic, QZanden, 
Tbscho, MayS, LawExplorer, Mdupont, JJMC89, Dvorapa, _jensen, rosalieper, 
Altostratus, Avicennasis, Scott_WUaS, mys_721tx, Wikidata-bugs, aude, jayvdb, 
Masti, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org


[Wikidata-bugs] [Maniphest] T286861: scripts/interwikidata.py returns a InconsistentTitleError on a Malayalam title

2021-07-18 Thread Amire80
Amire80 created this task.
Amire80 added projects: Pywikibot, Wikidata.
Restricted Application added subscribers: pywikibot-bugs-list, Aklapper.

TASK DESCRIPTION
  I am running this:
  
  `python3 pwb.py scripts/interwikidata.py -lang:shi -clean -page:'Aliksandṛ 
Puckin' -always`
  
  On this version of the article: 
https://shi.wikipedia.org/w/index.php?title=Aliksand%E1%B9%9B_Puckin=27245
  
  I am receiving this error:
  
>>> Aliksandṛ Puckin <<<
WARNING: API warning (query): The value passed for "titles" contains 
invalid or non-normalized data. Textual data should be valid, NFC-normalized 
Unicode without C0 control characters other than HT (\t), LF (\n), and CR (\r).

0 pages read
0 pages written
0 pages skipped
Execution time: 7 seconds
Script terminated by exception:

ERROR: InconsistentTitleError: Query on [[ml:അലക്സാണ്ടര്‍ പുഷ്കിന്‍]] 
returned data on 'അലക്സാണ്ടർ പുഷ്കിൻ'
Traceback (most recent call last):
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 399, in 
if not main():
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 391, in main
run_python_file(filename,
  File "/Users/aaharoni/dev/pywikibot-he/pwb.py", line 106, in 
run_python_file
exec(compile(source, filename, 'exec', dont_inherit=True),
  File "./scripts/interwikidata.py", line 249, in 
main()
  File "./scripts/interwikidata.py", line 243, in main
bot.run()
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/bot.py", line 1558, in 
run
self.treat(page)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/bot.py", line 1810, in 
treat
self.treat_page()
  File "./scripts/interwikidata.py", line 95, in treat_page
item = self.try_to_add()
  File "./scripts/interwikidata.py", line 177, in try_to_add
wd_data = self.get_items()
  File "./scripts/interwikidata.py", line 164, in get_items
if not iw_page.exists():
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/page/__init__.py", line 
716, in exists
return self.pageid > 0
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/page/__init__.py", line 
261, in pageid
self.site.loadpageinfo(self)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/site/_apisite.py", line 
1107, in loadpageinfo
self._update_page(page, query)
  File "/Users/aaharoni/dev/pywikibot-he/pywikibot/site/_apisite.py", line 
1084, in _update_page
raise InconsistentTitleError(page, pageitem['title'])
pywikibot.exceptions.InconsistentTitleError: Query on [[ml:അലക്സാണ്ടര്‍ 
പുഷ്കിന്‍]] returned data on 'അലക്സാണ്ടർ പുഷ്കിൻ'
CRITICAL: Exiting due to uncaught exception 
  
  I guess that it has something to do with the value of the Malayalam string:
  
[[ml:അലക്സാണ്ടര്‍ പുഷ്കിന്‍]]
  
  The string appears to be valid; for example, it works if I paste it to the 
Malayalam Wikipedia. However, Malayalam is a script that is known to have some 
encoding issues occasionally, and it's possible that I'm missing something.
  
  pywikibot probably shouldn't crash because of it.
  
  Tagging #Wikidata  because 
it may be related; please remove the tag if it's not.

TASK DETAIL
  https://phabricator.wikimedia.org/T286861

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Amire80
Cc: santhosh, Urbanecm, Aklapper, Amire80, pywikibot-bugs-list, Invadibot, 
Jyoo1011, maantietaja, JohnsonLee01, SHEKH, Dijkstra, Khutuck, Akuckartz, 
Zkhalido, Viztor, Nandana, Wenyi, Lahi, Gq86, GoranSMilovanovic, QZanden, 
Tbscho, MayS, LawExplorer, Mdupont, JJMC89, Dvorapa, _jensen, rosalieper, 
Altostratus, Avicennasis, Scott_WUaS, mys_721tx, Wikidata-bugs, aude, jayvdb, 
Masti, Alchimista, Mbch331
___
Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org
To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org