Hi, What is the procedure to add a language profile to LanguageIdentifier? Do we use Wikipedia as training set?
I'd like to add some languages relevant for Norway. In Norway there are two official languages: nb and nn. Those are recommended used instead of the common "no" tag. We also have a third language, Sami. You have northern sami and southern sami. The referenced ISO-639 list (http://www.w3.org/WAI/ER/IG/ert/iso639.htm) is obsolete as it does not list any of these. A better list is http://www.loc.gov/standards/iso639-2/php/code_list.php What if we have a requirement to represent language dialects such as en-US and en-GB? ISO-639 does not deal with such. Perhaps it is better to switch to RFC 5646 and IANA Language Subtag Registry (http://rishida.net/utils/subtags/) which uses ISO-6391 and ISO-639-2 but allows for region variants as well? -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Training in Europe - www.solrtraining.com
