Re: [HACKERS] suitable text search configuration
Have we got consensus that initdb should just look at the first component of the locale name to choose a text search configuration (at least for 8.3)? If so, who's going to make the change? I can do it but don't want to duplicate effort if someone else was already on it. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] suitable text search configuration
Tom Lane wrote: Have we got consensus that initdb should just look at the first component of the locale name to choose a text search configuration (at least for 8.3)? If so, who's going to make the change? I can do it but don't want to duplicate effort if someone else was already on it. Thanks, it works wonderfully for me now. -- Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4 Ni aun el genio muy grande llegarÃa muy lejos si tuviera que sacarlo todo de su propio interior (Goethe) ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [HACKERS] suitable text search configuration
Alvaro Herrera [EMAIL PROTECTED] writes: ... oh, I see there's a table in initdb.c Are we supposed to add entries to it, one for each country? I'm wondering if we should try to match the part before the _ using just the language, if the complete match fails. (i.e. match es_CL using just es, fr_CA using just fr, etc). Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. An alternative is to try to match the full locale (es_ES) and then try the language (es) if that wasn't found. That would leave room to put country-by-country exceptions in, but for the moment we'd not have any. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] suitable text search configuration
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: ... oh, I see there's a table in initdb.c Are we supposed to add entries to it, one for each country? I'm wondering if we should try to match the part before the _ using just the language, if the complete match fails. (i.e. match es_CL using just es, fr_CA using just fr, etc). Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. An alternative is to try to match the full locale (es_ES) and then try the language (es) if that wasn't found. That would leave room to put country-by-country exceptions in, but for the moment we'd not have any. Can anyone point to a real world example where country by country would make sense? If we need to distinguish flavors of some languages, I would not be at all surprised if this was not by country anyway. cheers andrew ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] suitable text search configuration
Andrew Dunstan [EMAIL PROTECTED] writes: Tom Lane wrote: Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. Can anyone point to a real world example where country by country would make sense? For the current set of built-in dictionaries it seems pretty clear that country distinctions are useless. If we ever did need that distinction it would only be after adding dictionaries that aren't going to be in 8.3 ... so I'm leaning to keeping the code simple for the moment. regards, tom lane ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [HACKERS] suitable text search configuration
Andrew Dunstan wrote: Tom Lane wrote: Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. I studied the standards a bit to see if they mandated that the locale names must be in the form language_COUNTRY, and couldn't find anything. Which makes me think it's mostly by (very well established) convention. I think trying to parse the _ should not be done on a first attempt. An alternative is to try to match the full locale (es_ES) and then try the language (es) if that wasn't found. That would leave room to put country-by-country exceptions in, but for the moment we'd not have any. Can anyone point to a real world example where country by country would make sense? If we need to distinguish flavors of some languages, I would not be at all surprised if this was not by country anyway. pt_BR versus pt_PT. I'm not sure if it makes a difference to a stemmer, but maybe to a thesaurus it does ... -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] suitable text search configuration
Alvaro Herrera [EMAIL PROTECTED] writes: Andrew Dunstan wrote: Can anyone point to a real world example where country by country would make sense? If we need to distinguish flavors of some languages, I would not be at all surprised if this was not by country anyway. pt_BR versus pt_PT. I'm not sure if it makes a difference to a stemmer, but maybe to a thesaurus it does ... Right, but only when we have built-in dictionaries that separately address the two countries will there be any need to teach initdb about it. I think we should KISS for now. regards, tom lane ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [HACKERS] suitable text search configuration
Tom Lane wrote: Alvaro Herrera [EMAIL PROTECTED] writes: ... oh, I see there's a table in initdb.c Are we supposed to add entries to it, one for each country? I'm wondering if we should try to match the part before the _ using just the language, if the complete match fails. (i.e. match es_CL using just es, fr_CA using just fr, etc). Actually, looking at the examples so far, I'm thinking we should just consider the string up to the first _, period. I found that there is an ISO spec for cultural elements, ISO/IEC 15897, a working draft for which can be found at http://www.open-std.org/jtc1/sc22/open/n3586.pdf Chapter 13 talks about naming of locales. I think glibc is supposed to follow this standard. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq