jayvdb created this task.
jayvdb added subscribers: jayvdb, Multichill, Xqt, valhallasw.
jayvdb added projects: pywikibot-core, Pywikibot-pagegenerators, 
Pywikibot-Wikidata.
Herald added subscribers: pywikibot-bugs-list, Aklapper.

TASK DESCRIPTION
  YearPageGenerator and DayPageGenerator need some love, and present an 
interesting potential use of Wikidata.
  They are implemented using the date.py library, which hasnt been actively 
maintained.
  
  * YearPageGenerator uses formatYear to generate a site specific list of page 
titles for the year articles (even if the page doesnt exist).
  * DayPageGenerator uses FormatDate and getNumberOfDaysInMonth to generate a 
site specific list of page titles for each day in the year, always including 29 
days in February, as no year is specified.
  
  formatYear supports BC and AD years, and transforms each year into a 
localised page title, e.g. 2003 is pushed through '%F (میلادی)' for Farsi to 
become  '۲۰۰۳ (میلادی)'.  However the correct transform only exists for some 
languages, and languages not known cause a KeyError.  
https://www.wikidata.org/wiki/Q1986 shows 2003 in many languages, which are not 
supported by date.py.  e.g. Amharic (am) is 
[[https://am.wikipedia.org/wiki/2003_እ.ኤ.አ.|2003 እ.ኤ.አ.]]
  
  ```
  >>> pywikibot.date.formatYear('am', 2003)
  Traceback (most recent call last):
    File "<console>", line 1, in <module>
    File ".../pywikibot/date.py", line 2368, in formatYear
      return formats['YearAD'][lang](year)
  KeyError: 'am'
  ```
  
  date.py appears to also have incorrect transforms. e.g. the transform for 
`jbo` is '%dmoi nanca', yet the wikipedia page title is 
[[https://jbo.wikipedia.org/wiki/2003moi|2003moi]].
  
  OTOH, Wikidata doesnt have a complete list of years in some languages, where 
the wiki doesnt have articles for all languages. e.g. there is no  Gujarati 
language (gu) article for 2003 in Wikidata, yet the date.py table works 
correctly.
  
  ```
  >>> pywikibot.date.formatYear('gu', 2003)
  '૨૦૦૩'
  ```
  
  BC years vary much more, as there isnt a good way to express negative years 
which languages commonly adopt, so words and abbreviations are used.  e.g. '-' 
prefix or 'a.C.' suffix are common.
  
  Also the date.py metadata is very sketchy for BC years, with languages 
supported for AD being invalid for BC lookups.
  
  ```
  >>> pywikibot.date.formatYear('th', 2003)
  'พ.ศ. 2546'
  >>> pywikibot.date.formatYear('th', -1)
  Traceback (most recent call last):
    File "/usr/lib64/python3.3/code.py", line 90, in runcode
      exec(code, self.locals)
    File "<console>", line 1, in <module>
    File ".../pywikibot/date.py", line 2366, in formatYear
      return formats['YearBC'][lang](-year)
  KeyError: 'th'
  ```
  
  With wikidata it should be easy & efficient to confirm that this function 
works correctly for some sample years, by comparing the result for each 
language with the sitelinks, and derive transforms for languages which do not 
work correctly.
  
  For languages with lots of Wikipedia pages, we could extract all of the page 
names from the Wikipedia sitelinks, if there are no missing pages.  We could 
use #wikidataquery to get a list of all instance of year 
<https://www.wikidata.org/wiki/Q577> @multichill, can we use #sparql for this?
  
  More importantly, it would be good if the prefix/suffix/format string could 
be loaded from an externally maintained data set.  Is there a TWN message or 
wikidata item which has a multilingual format string for years?  
[[https://www.wikidata.org/wiki/Q159791|A.D.]] doesnt have any relevant 
statements.
  
  However, it would also be worth investigating other libraries which might 
have good language support for formatted dates, and we could contribute this 
knowledge back into an external library which is better maintained.
  
  On T102174, @valhallasw suggested PyICU, which does appear to have reasonable 
language support.  http://userguide.icu-project.org/formatparse/datetime
  No doubt there are other libraries which are also worth investigating for 
suitability.
  
  Also worth discussing, .. what are these two generators and date.py being 
used for?  Or, should we simply drop date.py and these two generators?

TASK DETAIL
  https://phabricator.wikimedia.org/T104787

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: jayvdb
Cc: valhallasw, Xqt, Aklapper, Multichill, jayvdb, pywikibot-bugs-list, 
Ricordisamoa, Malyacko, P.Copp



_______________________________________________
pywikibot-bugs mailing list
pywikibot-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/pywikibot-bugs

Reply via email to