Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start in MB database?

2001-02-15 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 Okay, so if a database has been built by a backend that knows MULTIBYTE
 and has some "yomigana" info available, then indexes in text columns
 will not be in the same order that strcmp() would put them in, right?

 No. The "yomigana" exists in the application world, not in the
 database engine itself. What I was talking about was an idea to add
 an extra column to a table.

Oh, I see.  So the question still remains: can a MULTIBYTE-aware backend
ever use a sort order different from strcmp() order?  (That is, not as
a result of LOCALE, but just because of the non-SQL-ASCII encoding.)

Actually there are more complicated cases that would depend on more
features of the encoding than just sort order.  Consider

CREATE INDEX fooi ON foo (upper(field1));

Operations involving this index will misbehave if the behavior of
upper() ever differs between MULTIBYTE-aware and non-MULTIBYTE-aware
code.  That seems pretty likely for encodings like LATIN2...

regards, tom lane



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start in MB database?

2001-02-15 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 Oh, I see.  So the question still remains: can a MULTIBYTE-aware backend
 ever use a sort order different from strcmp() order?  (That is, not as
 a result of LOCALE, but just because of the non-SQL-ASCII encoding.)
 
 According to the code, no, because varstr_cmp() doesn't pay attention to
 the multibyte status.  Presumably strcmp() and strcoll() don't either.

 Right.

OK, so I guess this comes down to a judgment call: should we insert the
check in the non-MULTIBYTE case, or not?  I still think it's safest to
do so, but I'm not sure what you want to do.

regards, tom lane



[HACKERS] Shouldn't non-MULTIBYTE backend refuse to start in MB database?

2001-02-14 Thread Tom Lane

We now have defenses against running a non-LOCALE-enabled backend in a
database that was created in non-C locale.  Shouldn't we likewise
prevent a non-MULTIBYTE-enabled backend from running in a database with
a multibyte encoding that's not SQL_ASCII?  Or am I missing a reason why
that is safe?

I propose the following addition to ReverifyMyDatabase in postinit.c:

  #ifdef MULTIBYTE
SetDatabaseEncoding(dbform-encoding);
+ #else
+   if (dbform-encoding != SQL_ASCII)
+   elog(FATAL, "some suitable error message");
  #endif

Comments?

regards, tom lane



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start in MB database?

2001-02-14 Thread Tom Lane

Peter Eisentraut [EMAIL PROTECTED] writes:
 Tom Lane writes:
 We now have defenses against running a non-LOCALE-enabled backend in a
 database that was created in non-C locale.  Shouldn't we likewise
 prevent a non-MULTIBYTE-enabled backend from running in a database with
 a multibyte encoding that's not SQL_ASCII?  Or am I missing a reason why
 that is safe?

 Not all multibyte encodings are actually "multi"-byte, e.g., LATIN2.  In
 that case the main benefit is the on-the-fly recoding between the client
 and the server.  If a non-MB server encounters that database it should
 still work.

Are these encodings all guaranteed to have the same collation order as
SQL_ASCII?  If not, we have the same index corruption issues as for LOCALE.

regards, tom lane



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start in MB database?

2001-02-14 Thread Tom Lane

Tatsuo Ishii [EMAIL PROTECTED] writes:
 Are these encodings all guaranteed to have the same collation order as
 SQL_ASCII?

 Yes  no. 

Um, I'm confused ...

 If not, we have the same index corruption issues as for LOCALE.

 If the backend is configued with LOCALE enabled and the database is
 not configured with LOCALE, we will have a problem. But this will
 happen with/without MUTIBYTE anyway. Mutibyte support does nothing
 with LOCALE support.

Can a backend configured with MULTIBYTE and running in non-SQL_ASCII
encoding ever sort strings in non-character-code ordering, even if it
is in C locale?  I should think that such behavior is highly likely
for multibyte character sets.

If it can, then we mustn't allow a non-MULTIBYTE backend to run in
such a database, I think.

regards, tom lane