jayvdb created this task. jayvdb added subscribers: jayvdb, Multichill, Xqt, valhallasw. jayvdb added projects: pywikibot-core, Pywikibot-pagegenerators, Pywikibot-Wikidata. Herald added subscribers: pywikibot-bugs-list, Aklapper.
TASK DESCRIPTION YearPageGenerator and DayPageGenerator need some love, and present an interesting potential use of Wikidata. They are implemented using the date.py library, which hasnt been actively maintained. * YearPageGenerator uses formatYear to generate a site specific list of page titles for the year articles (even if the page doesnt exist). * DayPageGenerator uses FormatDate and getNumberOfDaysInMonth to generate a site specific list of page titles for each day in the year, always including 29 days in February, as no year is specified. formatYear supports BC and AD years, and transforms each year into a localised page title, e.g. 2003 is pushed through '%F (میلادی)' for Farsi to become '۲۰۰۳ (میلادی)'. However the correct transform only exists for some languages, and languages not known cause a KeyError. https://www.wikidata.org/wiki/Q1986 shows 2003 in many languages, which are not supported by date.py. e.g. Amharic (am) is [[https://am.wikipedia.org/wiki/2003_እ.ኤ.አ.|2003 እ.ኤ.አ.]] ``` >>> pywikibot.date.formatYear('am', 2003) Traceback (most recent call last): File "<console>", line 1, in <module> File ".../pywikibot/date.py", line 2368, in formatYear return formats['YearAD'][lang](year) KeyError: 'am' ``` date.py appears to also have incorrect transforms. e.g. the transform for `jbo` is '%dmoi nanca', yet the wikipedia page title is [[https://jbo.wikipedia.org/wiki/2003moi|2003moi]]. OTOH, Wikidata doesnt have a complete list of years in some languages, where the wiki doesnt have articles for all languages. e.g. there is no Gujarati language (gu) article for 2003 in Wikidata, yet the date.py table works correctly. ``` >>> pywikibot.date.formatYear('gu', 2003) '૨૦૦૩' ``` BC years vary much more, as there isnt a good way to express negative years which languages commonly adopt, so words and abbreviations are used. e.g. '-' prefix or 'a.C.' suffix are common. Also the date.py metadata is very sketchy for BC years, with languages supported for AD being invalid for BC lookups. ``` >>> pywikibot.date.formatYear('th', 2003) 'พ.ศ. 2546' >>> pywikibot.date.formatYear('th', -1) Traceback (most recent call last): File "/usr/lib64/python3.3/code.py", line 90, in runcode exec(code, self.locals) File "<console>", line 1, in <module> File ".../pywikibot/date.py", line 2366, in formatYear return formats['YearBC'][lang](-year) KeyError: 'th' ``` With wikidata it should be easy & efficient to confirm that this function works correctly for some sample years, by comparing the result for each language with the sitelinks, and derive transforms for languages which do not work correctly. For languages with lots of Wikipedia pages, we could extract all of the page names from the Wikipedia sitelinks, if there are no missing pages. We could use #wikidataquery to get a list of all instance of year <https://www.wikidata.org/wiki/Q577> @multichill, can we use #sparql for this? More importantly, it would be good if the prefix/suffix/format string could be loaded from an externally maintained data set. Is there a TWN message or wikidata item which has a multilingual format string for years? [[https://www.wikidata.org/wiki/Q159791|A.D.]] doesnt have any relevant statements. However, it would also be worth investigating other libraries which might have good language support for formatted dates, and we could contribute this knowledge back into an external library which is better maintained. On T102174, @valhallasw suggested PyICU, which does appear to have reasonable language support. http://userguide.icu-project.org/formatparse/datetime No doubt there are other libraries which are also worth investigating for suitability. Also worth discussing, .. what are these two generators and date.py being used for? Or, should we simply drop date.py and these two generators? TASK DETAIL https://phabricator.wikimedia.org/T104787 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jayvdb Cc: valhallasw, Xqt, Aklapper, Multichill, jayvdb, pywikibot-bugs-list, Ricordisamoa, Malyacko, P.Copp _______________________________________________ pywikibot-bugs mailing list pywikibot-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs