[Wikidata-bugs] [Maniphest] [Commented On] T230833: wbsearchentities for lexemes returns 'und' match language on Test Wikidata

Nikki Thu, 30 Apr 2020 06:43:03 -0700

Nikki added a comment.


  But what would the correct statements be? We can't add an ISO 639-1 code if 
the language doesn't have one! :) All the ISO 639-1 codes which exist are (or 
should be) already in Wikidata - there's only ~200 and new ones are not being 
assigned any more. If we want to return anything other than `und` for the 
thousands of other potential languages, we would need to use something else 
like `P220` (ISO 639-3 code) or `P305` (IETF language tag).
  
  Using IETF language tags seems like the most useful solution to me. ISO 639-1 
is too limited. Only using ISO 639-3 would be awkward because those are always 
three-letter codes (e.g. `en` would turn into `eng`, `de` into `deu`). Falling 
back to ISO 639-3 if there's no ISO 639-1 code would be an improvement, but 
that's essentially how IETF language tags are assigned - English has the ISO 
639-1 code `en`, ISO 639-3 code `eng` and its IETF language tag is `en`, Scots 
only has the ISO 639-3 code `sco` and its IETF language tag is `sco`.
  
  Here's a query of all the languages being used for lexemes right now which 
don't have an ISO 639-1 code: https://w.wiki/PXQ

TASK DETAIL
  https://phabricator.wikimedia.org/T230833

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Nikki
Cc: Addshore, Lydia_Pintscher, Nikki, LucasWerkmeister, darthmon_wmde, Nandana, 
Mringgaard, Lahi, Gq86, GoranSMilovanovic, QZanden, LawExplorer, _jensen, 
rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331

_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

[Wikidata-bugs] [Maniphest] [Commented On] T230833: wbsearchentities for lexemes returns 'und' match language on Test Wikidata

Reply via email to