Re: [HACKERS] Supporting SJIS as a database encoding

Kyotaro HORIGUCHI Mon, 05 Sep 2016 20:31:10 -0700

Hello,

At Mon, 5 Sep 2016 19:38:33 +0300, Heikki Linnakangas <[email protected]> wrote 
in <[email protected]>
> On 09/05/2016 05:47 PM, Tom Lane wrote:
> > "Tsunakawa, Takayuki" <[email protected]> writes:
> >> Before digging into the problem, could you share your impression on
> >> whether PostgreSQL can support SJIS?  Would it be hopeless?
> >
> > I think it's pretty much hopeless.
> 
> Agreed.


+1, even as a user of SJIS:)

> But one thing that would help a little, would be to optimize the UTF-8
> -> SJIS conversion. It uses a very generic routine, with a binary
> search over a large array of mappings. I bet you could do better than
> that, maybe using a hash table or a radix tree instead of the large
> binary-searched array.

I'm very impressed by the idea. Mean number of iterations for
binsearch on current conversion table with 8000 characters is
about 13 and the table size is under 100kBytes (maybe).

A three-level array with 2 byte values will take about 1.6~2MB of memory.

A radix tree for UTF-8->some-encoding conversion requires about,
or up to.. (using 1 byte index to point the next level)

(1 *  ((7f + 1) +
      (df - c2 + 1) * (bf - 80 + 1) +
      (ef - e0 + 1) * (bf - 80 + 1)^2)) = 67 kbytes.

SJIS characters are 2byte length at longest so about 8000
characters takes extra 16 k Bytes. And some padding space will be
added on them.

As the result, radix tree seems to be promising because of small
requirement of additional memory and far less comparisons.  Also
Big5 and other encodings including EUC-* will get benefit from
it.

Implementing radix tree code, then redefining the format of
mapping table to suppot radix tree, then modifying mapping
generator script are needed.

If no one oppse to this, I'll do that.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Supporting SJIS as a database encoding

Reply via email to