Hi Martin,

now I have some time to explain the problem. I will try to start a more comprehensive explanation than usual about the i18n implementation of OpenCA.

the current OpenCA::DBI tries to set the character set using the 'SET NAMES'
SQL command from the local locale. This SQL command is not supported
by Oracle (version 9.2).

Instead, Oracle uses either
'ALTER DATABASE <dbname> CHARACTER SET <charset>' or
'ALTER DATABASE <dbname> NATIONAL CHARACTER SET <national charset>'
to *permanently* set the desired character set for the database.

Now it would of course be easy to simply issue the commands manually
in the SQL monitor and ignore OpenCA's complaining about the failed
'SET NAMES' command.

Before doing so I was wondering to which character set I should set
the database - ISO-8859-1 would work fine for Germany but surely would
produce problems in, say, eastern Europe.

Any ideas? How about UTF8? Will OpenCA be able to handle this
correctly?

By the way: what is the point in setting the character set in the
Database depending on the current locale?

The system administrator might decide to change the locale for the
system, so this could influence all data in the database as well
if I am not completely wrong.

Generally we must support all languages because a university like mine needs to give the user the chance to use it's own native language if it is available (we call this service :) ).


Today
-----

web browser --> apache --> openca --> database

1. The web browser uses the encoding of the OpenCA language.
2. We try to set the database to the encoding of the OpenCA language.
3. The content in the database is encoded with used language encoding.

Problems:
1. What happens if a japanese guy inserts data with EUC-JP and the RA Operator uses ISO-8859-1?
2. What happens if I replace in a ISO-8859-1 text a string with EUC-JP encoding?
3. What does OpenSSL think about EUC-JP?
4. Several databases don't like different encodings in one table.


Future
------

The optimal solution would be if we have only to deal with UTF-8. This means the user would only receive HTML data which is UTF-8 encoded. The databases would be happy. Perl is already aware of UTF-8.

The language databases are the biggest problem. There are several open issues because I'm a beginner on this stuff:

1. How can we migrate the existing translations to UTF-8?
2. We need a howto about the editing of the translations. Can somebody describe a working UTF-8 environment for translating?
3. Does it be possible to convert between the encodings automatically?


Martin, I think I produced more questions than answers but perhaps somebody like Janez has a little bit more know how than I and can give us some hints.

Michael
--
_______________________________________________________________

Michael Bell                    Humboldt-Universitaet zu Berlin

Tel.: +49 (0)30-2093 2482       ZE Computer- und Medienservice
Fax:  +49 (0)30-2093 2704       Unter den Linden 6
[EMAIL PROTECTED]   D-10099 Berlin
_______________________________________________________________


------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ OpenCA-Devel mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/openca-devel

Reply via email to