I'd like to propose adding SJIS as a database encoding.  You may wonder why 
SJIS is still necessary in the world of Unicode.  The purpose is to achieve 
comparable performance when migrating legacy database systems from other DBMSs 
without little modification of applications.

Recently, we failed to migrate some customer's legacy database from DBMS-X to 
PostgreSQL.  That customer wished for PostgreSQL, but PostgreSQL couldn't meet 
the performance requirement.

The system uses DBMS-X with the database character set being SJIS.  The main 
applications are written in embedded SQL, which require SJIS in their host 
variables.  They insisted they cannot use UTF8 for the host variables because 
that would require large modification of applications due to character 
handling.  So no character set conversion is necessary between the clients and 
the server.

On the other hand, PostgreSQL doesn't support SJIS as a database encoding.  
Therefore, character set conversion from UTF-8 to SJIS has to be performed.  
The batch application runs millions of SELECTS each of which retrieves more 
than 100 columns.  And many of those columns are of character type.

If PostgreSQL supports SJIS, PostgreSQL will match or outperform the 
performance of DBMS-X with regard to the applications.  We confirmed it by 
using psql to run a subset of the batch processing.  When the client encoding 
is SJIS, one FETCH of 10,000 rows took about 500ms.  When the client encoding 
is UTF8 (the same as the database encoding), the same FETCH took 270ms.

Supporting SJIS may somewhat regain attention to PostgreSQL here in Japan, in 
the context of database migration.  BTW, MySQL supports SJIS as a database 
encoding.  PostgreSQL used to be the most popular open source database in 
Japan, but MySQL is now more popular.

But what I'm wondering is why PostgreSQL doesn't support SJIS.  Was there any 
technical difficulty?  Is there anything you are worried about if adding SJIS?

I'd like to write a patch for adding SJIS if there's no strong objection.  I'd 
appreciate it if you could let me know good design information to add a server 
encoding (e.g. the URL of the most recent patch to add a new server encoding)

Takayuki Tsunakawa

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to