Hi,

Do anyone have an answer to this question that I posted last week?
I know how to generate profiles for Nutch, but not for Tika.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Training in Europe - www.solrtraining.com

On 20. aug. 2010, at 22.07, Jan Høydahl / Cominvent wrote:

> Hi,
> 
> What is the procedure to add a language profile to LanguageIdentifier? Do we 
> use Wikipedia as training set?
> 
> I'd like to add some languages relevant for Norway.
> In Norway there are two official languages: nb and nn. Those are recommended 
> used instead of the common "no" tag.
> 
> We also have a third language, Sami. You have northern sami and southern 
> sami. The referenced ISO-639 list 
> (http://www.w3.org/WAI/ER/IG/ert/iso639.htm) is obsolete as it does not list 
> any of these. A better list is 
> http://www.loc.gov/standards/iso639-2/php/code_list.php
> 
> What if we have a requirement to represent language dialects such as en-US 
> and en-GB? ISO-639 does not deal with such. Perhaps it is better to switch to 
> RFC 5646 and IANA Language Subtag Registry 
> (http://rishida.net/utils/subtags/) which uses ISO-6391 and ISO-639-2 but 
> allows for region variants as well?
> 
> --
> Jan Høydahl, search solution architect
> Cominvent AS - www.cominvent.com
> Training in Europe - www.solrtraining.com
> 

Reply via email to