Re: [HACKERS] suitable text search configuration

2007-10-25 Thread Tom Lane
Have we got consensus that initdb should just look at the first
component of the locale name to choose a text search configuration
(at least for 8.3)?  If so, who's going to make the change?
I can do it but don't want to duplicate effort if someone else
was already on it.

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] suitable text search configuration

2007-10-25 Thread Alvaro Herrera
Tom Lane wrote:
 Have we got consensus that initdb should just look at the first
 component of the locale name to choose a text search configuration
 (at least for 8.3)?  If so, who's going to make the change?
 I can do it but don't want to duplicate effort if someone else
 was already on it.

Thanks, it works wonderfully for me now.

-- 
Alvaro Herrera http://www.amazon.com/gp/registry/CTMLCN8V17R4
Ni aun el genio muy grande llegaría muy lejos
si tuviera que sacarlo todo de su propio interior (Goethe)

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 ... oh, I see there's a table in initdb.c

 Are we supposed to add entries to it, one for each country?  I'm
 wondering if we should try to match the part before the _ using just the
 language, if the complete match fails.  (i.e. match es_CL using just
 es, fr_CA using just fr, etc).

Actually, looking at the examples so far, I'm thinking we should just
consider the string up to the first _, period.

An alternative is to try to match the full locale (es_ES) and then try
the language (es) if that wasn't found.  That would leave room to put
country-by-country exceptions in, but for the moment we'd not have any.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Andrew Dunstan



Tom Lane wrote:

Alvaro Herrera [EMAIL PROTECTED] writes:
  

... oh, I see there's a table in initdb.c



  

Are we supposed to add entries to it, one for each country?  I'm
wondering if we should try to match the part before the _ using just the
language, if the complete match fails.  (i.e. match es_CL using just
es, fr_CA using just fr, etc).



Actually, looking at the examples so far, I'm thinking we should just
consider the string up to the first _, period.

An alternative is to try to match the full locale (es_ES) and then try
the language (es) if that wasn't found.  That would leave room to put
country-by-country exceptions in, but for the moment we'd not have any.


  


Can anyone point to a real world example where country by country would 
make sense? If we need to distinguish flavors of some languages, I would 
not be at all surprised if this was not by country anyway.


cheers

andrew

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Actually, looking at the examples so far, I'm thinking we should just
 consider the string up to the first _, period.

 Can anyone point to a real world example where country by country would 
 make sense?

For the current set of built-in dictionaries it seems pretty clear that
country distinctions are useless.  If we ever did need that distinction
it would only be after adding dictionaries that aren't going to be in
8.3 ... so I'm leaning to keeping the code simple for the moment.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Alvaro Herrera
Andrew Dunstan wrote:

 Tom Lane wrote:

 Actually, looking at the examples so far, I'm thinking we should just
 consider the string up to the first _, period.

I studied the standards a bit to see if they mandated that the locale
names must be in the form language_COUNTRY, and couldn't find
anything.  Which makes me think it's mostly by (very well established)
convention.  I think trying to parse the _ should not be done on a first
attempt.

 An alternative is to try to match the full locale (es_ES) and then try
 the language (es) if that wasn't found.  That would leave room to put
 country-by-country exceptions in, but for the moment we'd not have any.

 Can anyone point to a real world example where country by country would 
 make sense? If we need to distinguish flavors of some languages, I would 
 not be at all surprised if this was not by country anyway.

pt_BR versus pt_PT.  I'm not sure if it makes a difference to a stemmer,
but maybe to a thesaurus it does ...

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Andrew Dunstan wrote:
 Can anyone point to a real world example where country by country would 
 make sense? If we need to distinguish flavors of some languages, I would 
 not be at all surprised if this was not by country anyway.

 pt_BR versus pt_PT.  I'm not sure if it makes a difference to a stemmer,
 but maybe to a thesaurus it does ...

Right, but only when we have built-in dictionaries that separately
address the two countries will there be any need to teach initdb about
it.  I think we should KISS for now.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] suitable text search configuration

2007-10-24 Thread Alvaro Herrera
Tom Lane wrote:
 Alvaro Herrera [EMAIL PROTECTED] writes:
  ... oh, I see there's a table in initdb.c
 
  Are we supposed to add entries to it, one for each country?  I'm
  wondering if we should try to match the part before the _ using just the
  language, if the complete match fails.  (i.e. match es_CL using just
  es, fr_CA using just fr, etc).
 
 Actually, looking at the examples so far, I'm thinking we should just
 consider the string up to the first _, period.

I found that there is an ISO spec for cultural elements, ISO/IEC
15897, a working draft for which can be found at
http://www.open-std.org/jtc1/sc22/open/n3586.pdf

Chapter 13 talks about naming of locales.

I think glibc is supposed to follow this standard.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq