Re: [HACKERS] More message encoding woes

2009-04-08 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking pg_get_encoding_from_locale(NULL) == encoding which is more close to what we actually want. The downside is that

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: In practice you get either the GNU or the Solaris version of gettext, and at least the GNU version can cope with all the encoding names that the currently Windows-only code path produces. It doesn't. On my laptop running Debian testing: hlinn...@heikkilaptop:~$

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut
On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: Peter Eisentraut wrote: In practice you get either the GNU or the Solaris version of gettext, and at least the GNU version can cope with all the encoding names that the currently Windows-only code path produces. It doesn't. On my

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's universal across platforms, and I

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Hiroshi Inoue wrote: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. Assuming that's

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Hiroshi Inoue wrote: What is wrong with checking if the codeset is valid using iconv_open()? That would probably work as well. We'd have to decide what we'd try to convert from with iconv_open(). Utf-8 might be a safe choice. We don't currently use iconv_open() anywhere in the backend,

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 11:21:25 Heikki Linnakangas wrote: Using the name for the latin1 encoding in the currently Windows-only mapping table, LATIN1, you get no translation because that name is not recognized by the system. Using the other name ISO-8859-1, it works.

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Hiroshi Inoue wrote: What is wrong with checking if the codeset is valid using iconv_open()? That would probably work as well. We'd have to decide what we'd try to convert from with iconv_open(). The problem I have with that is

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Heikki Linnakangas
Peter Eisentraut wrote: On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking pg_get_encoding_from_locale(NULL) == encoding which is more close to what we actually want. The downside is that

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Peter Eisentraut
On Tuesday 07 April 2009 13:09:42 Heikki Linnakangas wrote: Patch attached. Instead of checking for LC_CTYPE == C, I'm checking pg_get_encoding_from_locale(NULL) == encoding which is more close to what we actually want. The downside is that pg_get_encoding_from_locale(NULL) isn't exactly free,

Re: [HACKERS] More message encoding woes

2009-04-07 Thread Hiroshi Inoue
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Hiroshi Inoue wrote: What is wrong with checking if the codeset is valid using iconv_open()? That would probably work as well. We'd have to decide what we'd try to convert from with iconv_open(). The problem

Re: [HACKERS] More message encoding woes

2009-04-06 Thread Peter Eisentraut
On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() which fixes that, but we only do it on Windows. In earlier versions we called it on all platforms, but only for UTF-8. It seems that we should call

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Maybe use a special string Translate Me First that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Peter Eisentraut
On Monday 30 March 2009 15:52:37 Heikki Linnakangas wrote: What is happening is that gettext() returns the message in the encoding determined by LC_CTYPE, while we expect it to return it in the database encoding. Starting with PG 8.3 we enforce that the encoding specified in LC_CTYPE matches

Re: [HACKERS] More message encoding woes

2009-04-02 Thread Hiroshi Inoue
Hiroshi Inoue wrote: Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Maybe use a special string Translate Me First that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Maybe use a special string Translate Me First that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Alvaro Herrera
Tom Lane wrote: Alvaro Herrera alvhe...@commandprompt.com writes: One problem with this idea is that it may be hard to coerce gettext into putting a particular string at the top of the file :-( I doubt we can, which is why the documentation needs to tell translators about it. I doubt

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue
Heikki Linnakangas wrote: Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Maybe use a special string Translate Me First that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Tom Lane
Hiroshi Inoue in...@tpf.co.jp writes: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. It doesn't occur in the

Re: [HACKERS] More message encoding woes

2009-04-01 Thread Hiroshi Inoue
Tom Lane wrote: Hiroshi Inoue in...@tpf.co.jp writes: Heikki Linnakangas wrote: I just tried that, and it seems that gettext() does transliteration, so any characters that have no counterpart in the database encoding will be replaced with something similar, or question marks. It doesn't

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas
Heikki Linnakangas wrote: One idea is to extract the encoding from LC_MESSAGES. Then call pg_get_encoding_from_locale() on that and check that it matches server_encoding. If it does, great, pass it to bind_textdomain_codeset(). If it doesn't, throw an error. I tried to implement this but it

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I'm leaning towards the idea of trying out all the spellings of the database encoding we have in encoding_match_list. That gives the best user experience, as it just works, and it doesn't seem that complicated. How were you going

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: I'm leaning towards the idea of trying out all the spellings of the database encoding we have in encoding_match_list. That gives the best user experience, as it just works, and it doesn't seem that complicated.

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Maybe use a special string Translate Me First that doesn't actually need to be end-user-visible, just so no one sweats over getting it right in context. Yep, something like that. There seems to be a magic empty

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Alvaro Herrera
Tom Lane wrote: At first that sounded like an ideal answer, but I can see a gotcha: suppose the translation's author's name contains some characters that don't convert to the database encoding. I suppose that would result in failure, when we'd prefer it not to. A single-purpose string could

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Alvaro Herrera alvhe...@commandprompt.com writes: Tom Lane wrote: At first that sounded like an ideal answer, but I can see a gotcha: suppose the translation's author's name contains some characters that don't convert to the database encoding. I suppose that would result in failure, when

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut
On Monday 30 March 2009 21:04:00 Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Could we get away with just unconditionally calling bind_textdomain_codeset with *our* canonical spelling of the encoding name? If it works, great, and if it

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Peter Eisentraut
On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database

Re: [HACKERS] More message encoding woes

2009-03-31 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes: On Monday 30 March 2009 20:06:48 Heikki Linnakangas wrote: LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only have a problem with the C locale. Why don't we apply the same restriction to the C locale then? (1) what

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: In CVS HEAD, we call bind_textdomain_codeset() in SetDatabaseEncoding() which fixes that, but we only do it on Windows. In earlier versions we called it on all platforms, but only for UTF-8. It seems that we should call

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Another idea is to try the values listed in our encoding_match_list[] until bind_textdomain_codeset succeeds. The problem here is that the GNU documentation is *exceedingly* vague about whether bind_textdomain_codeset behaves sanely (ie throws a recognizable error) when given a

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Another idea is to try the values listed in our encoding_match_list[] until bind_textdomain_codeset succeeds. The problem here is that the GNU documentation is *exceedingly* vague about whether

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database encoding, we only have a problem with the C locale. --

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to match the database

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Where does it get the default codeset from? Maybe we could constrain that to match the database encoding, the way we do for LC_COLLATE/CTYPE? LC_CTYPE. In 8.3 and up where we constrain that to

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Could we get away with just unconditionally calling bind_textdomain_codeset with *our* canonical spelling of the encoding name? If it works, great, and if it doesn't, you get English. Yeah, that's better than

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Heikki Linnakangas
Tom Lane wrote: What we need is an API equivalent to iconv --list, but I'm not seeing one :-(. There's also locale -m. Looking at the implementation of that, it just lists what's in /usr/share/i18n/charmaps. Not too portable either.. Do we need to go so far as to try to run that program?

Re: [HACKERS] More message encoding woes

2009-03-30 Thread Zdenek Kotala
Tom Lane píše v po 30. 03. 2009 v 14:04 -0400: Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: Tom Lane wrote: Could we get away with just unconditionally calling bind_textdomain_codeset with *our* canonical spelling of the encoding name? If it works, great, and if it