Hi,

What is the procedure to add a language profile to LanguageIdentifier? Do we 
use Wikipedia as training set?

I'd like to add some languages relevant for Norway.
In Norway there are two official languages: nb and nn. Those are recommended 
used instead of the common "no" tag.

We also have a third language, Sami. You have northern sami and southern sami. 
The referenced ISO-639 list (http://www.w3.org/WAI/ER/IG/ert/iso639.htm) is 
obsolete as it does not list any of these. A better list is 
http://www.loc.gov/standards/iso639-2/php/code_list.php

What if we have a requirement to represent language dialects such as en-US and 
en-GB? ISO-639 does not deal with such. Perhaps it is better to switch to RFC 
5646 and IANA Language Subtag Registry (http://rishida.net/utils/subtags/) 
which uses ISO-6391 and ISO-639-2 but allows for region variants as well?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

Reply via email to