Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?

2001-02-16 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  Oh, I see.  So the question still remains: can a MULTIBYTE-aware backend
  ever use a sort order different from strcmp() order?  (That is, not as
  a result of LOCALE, but just because of the non-SQL-ASCII encoding.)
  
  According to the code, no, because varstr_cmp() doesn't pay attention to
  the multibyte status.  Presumably strcmp() and strcoll() don't either.
 
  Right.
 
 OK, so I guess this comes down to a judgment call: should we insert the
 check in the non-MULTIBYTE case, or not?  I still think it's safest to
 do so, but I'm not sure what you want to do.
 
   regards, tom lane

I have discussed with Japanese hackers including Hiroshi of this
issue. We have reached the conclusion that your proposal is
appropreate and will make PostgreSQL more statble.
--
Tatsuo Ishii



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?

2001-02-15 Thread Tatsuo Ishii

 Tatsuo Ishii [EMAIL PROTECTED] writes:
  If we sort these strings using strcmp(), we would get:
  ...
  This result might not be perfect, but resonable for most cases since
  the code value of each character in EUC_JP is defined in the hope that
  it can be sorted by its phisical value.
 
  If we are not satisfied with this result for some reasons, we could
  add an auxiliary "yomigana" field to get the correct order (Yomigana
  is a pronounciation of KANJI).
 
 Okay, so if a database has been built by a backend that knows MULTIBYTE
 and has some "yomigana" info available, then indexes in text columns
 will not be in the same order that strcmp() would put them in, right?

No. The "yomigana" exists in the application world, not in the
database engine itself. What I was talking about was an idea to add
an extra column to a table.

create table t1 (
   kanji text,  -- KANJI field
   yomigana text-- YOMIGANA field
);

The query would be something like:

select kanji from t1 order by yomigana;
--
Tatsuo Ishii



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?

2001-02-15 Thread Tatsuo Ishii

 Tom Lane writes:
 
  Oh, I see.  So the question still remains: can a MULTIBYTE-aware backend
  ever use a sort order different from strcmp() order?  (That is, not as
  a result of LOCALE, but just because of the non-SQL-ASCII encoding.)
 
 According to the code, no, because varstr_cmp() doesn't pay attention to
 the multibyte status.  Presumably strcmp() and strcoll() don't either.

Right.

  Actually there are more complicated cases that would depend on more
  features of the encoding than just sort order.  Consider
 
  CREATE INDEX fooi ON foo (upper(field1));
 
  Operations involving this index will misbehave if the behavior of
  upper() ever differs between MULTIBYTE-aware and non-MULTIBYTE-aware
  code.  That seems pretty likely for encodings like LATIN2...
 
 Of course in the most general case this is a problem, because a function
 can be implemented totally differently depending on any old #ifdef or
 other external factors.
 
 If the multibyte users think this check is okay, then I don't mind, since
 it's usually what the users would want anyway.  I'm just pointing out the
 technical issues.

Right. However, Tom's point is a little bit different, I guess.

As far as I know, most builtin functions taking string data types as
their aruguments would behave same with/without MULTIBYTE.  As far as
I know exceptions include:

char_length
quote_ident
quote_literal
ascii
to_ascii

So, for example, 

CREATE INDEX fooi ON foo (char_length(field1));

would behave differently with/without MULTIBYTE if the encoding for
the database is not "single byte type".
--
Tatsuo Ishii



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?

2001-02-14 Thread Tatsuo Ishii

 Peter Eisentraut [EMAIL PROTECTED] writes:
  Tom Lane writes:
  We now have defenses against running a non-LOCALE-enabled backend in a
  database that was created in non-C locale.  Shouldn't we likewise
  prevent a non-MULTIBYTE-enabled backend from running in a database with
  a multibyte encoding that's not SQL_ASCII?  Or am I missing a reason why
  that is safe?
 
  Not all multibyte encodings are actually "multi"-byte, e.g., LATIN2.  In
  that case the main benefit is the on-the-fly recoding between the client
  and the server.  If a non-MB server encounters that database it should
  still work.
 
 Are these encodings all guaranteed to have the same collation order as
 SQL_ASCII?

Yes  no. 

If not, we have the same index corruption issues as for LOCALE.

If the backend is configued with LOCALE enabled and the database is
not configured with LOCALE, we will have a problem. But this will
happen with/without MUTIBYTE anyway. Mutibyte support does nothing
with LOCALE support.
--
Tatsuo Ishii



Re: [HACKERS] Shouldn't non-MULTIBYTE backend refuse to start inMB database?

2001-02-14 Thread Tatsuo Ishii

 We now have defenses against running a non-LOCALE-enabled backend in a
 database that was created in non-C locale.  Shouldn't we likewise
 prevent a non-MULTIBYTE-enabled backend from running in a database with
 a multibyte encoding that's not SQL_ASCII?  Or am I missing a reason why
 that is safe?
 
 I propose the following addition to ReverifyMyDatabase in postinit.c:
 
   #ifdef MULTIBYTE
   SetDatabaseEncoding(dbform-encoding);
 + #else
 + if (dbform-encoding != SQL_ASCII)
 + elog(FATAL, "some suitable error message");
   #endif
 
 Comments?

Running a non-MULTIBYTE-enabled backend on a database with a multibyte
encoding other than SQL_ASCII should be safe as long as:

1) read only access
2) the encodings are actually single byte encodings

If mutibyte encoding database is updated by a non-MULTIBYTE-enabled
backend, there might be a chance that data could corrupted since the
backend does not handle mutibyte strings correctly.

So I think you suggestion is a improvement.
--
Tatsuo Ishii