Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

Barry Lind Fri, 04 May 2001 18:08:49 -0700

I can see that I'm probably not going to win this argument, but I'll 
take one more try. :-)

The basic issue I have it that the server is providing an API to the 
client to get the character encoding for the database and that API can 
report incorrect information to the client. 

If multibyte isn't enabled, getdatabaseencoding() always returns 
SQL_ASCII.  In my understanding SQL_ASCII =  7bit ascii (at least that 
is what the code in backend/utils/mb/conv.c is assuming).  But in 
reality SQL_ASCII means some unknown single byte character encoding.  
But if multibyte is enabled then SQL_ASCII means 7bit ascii.  And as far 
as I know there is no way for the client to know if multibyte is enabled 
or not.

Thus I would be happy if getdatabaseencoding() returned 'UNKNOWN' or 
something similar when in fact it doesn't know what the encoding is 
(i.e. when not compiled with multibyte).  That way users of this 
function on the client have a means of knowing does the server mean 7bit 
ascii or not.  (Alternatively, having some other fuction like 
getmultibyteenabled(Y/N) would work as well, because using that value 
you can then determine whether or not to trust the value of 
getdatabaseencoding).

I just don't like having an api that under some circumstances you can't 
rely on its returned value as being correct.

thanks,
--Barry

PS.  Note that if multibyte is enabled, the functionality that is being 
complained about here in the jdbc client is apparently ok for the server 
to do.  If you insert a value into a text column on a SQL_ASCII database 
with multibyte enabled and that value contains 8bit characters, those 
8bit characters will be quietly replaced with a dummy character since 
they are invalid for the SQL_ASCII 7bit character set.


Tom Lane wrote:

> Barry Lind <[EMAIL PROTECTED]> writes:
> 
>> Now it is an easy change in the jdbc code to use LATIN1 when the server 
>> reports SQL_ASCII, but I really dislike hardcoding support that only 
>> works in english speaking countries and Western Europe.
> 
> 
> What's wrong with that?  It won't be any more broken for people who are
> not really using LATIN1, and it will be considerably less broken for
> those who are.  Seems like a net win to me, even without making the
> obvious point about where the majority of Postgres users are.
> 
> It probably would be a good idea to allow the backend to store an
> indication of character set even when not compiled for MULTIBYTE,
> but that's not the issue here.  To me, the issue is whether JDBC
> makes a reasonable effort not to munge data when presented with
> a backend that claims to be using SQL_ASCII (which, let me remind
> you, is the default setting).  Converting high-bit-set characters
> to '?' is almost certainly NOT what the user wants you to do.
> Converting on the assumption of LATIN1 will make a lot of people
> happy, and the people who aren't happy with it will certainly not
> be happy with '?' conversion either.
> 
>> All this does 
>> is move the problem from being one that non-english countries have to 
>> being one where it is a non-english and non-western european problem 
>> (eg. Eastern Europe, Russia, etc.).
> 
> 
> Nonsense.  The non-Western-European folks see broken behavior now
> anyway, unless they compile with MULTIBYTE and set an appropriate
> encoding.  How would this make their lives worse, or even different?
> 
> I'm merely suggesting that the default behavior could be made useful
> to a larger set of people than it now is, without making things any
> worse for those that it's not useful to.
> 
>               regards, tom lane
> 
> 


---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [JDBC] Re: A bug with pgsql 7.1/jdbc and non-ascii (8-bit) chars?

Reply via email to