Hello, At Mon, 5 Sep 2016 19:38:33 +0300, Heikki Linnakangas <hlinn...@iki.fi> wrote in <529db688-72fc-1ca2-f898-b0b99e300...@iki.fi> > On 09/05/2016 05:47 PM, Tom Lane wrote: > > "Tsunakawa, Takayuki" <tsunakawa.ta...@jp.fujitsu.com> writes: > >> Before digging into the problem, could you share your impression on > >> whether PostgreSQL can support SJIS? Would it be hopeless? > > > > I think it's pretty much hopeless. > > Agreed.
+1, even as a user of SJIS:) > But one thing that would help a little, would be to optimize the UTF-8 > -> SJIS conversion. It uses a very generic routine, with a binary > search over a large array of mappings. I bet you could do better than > that, maybe using a hash table or a radix tree instead of the large > binary-searched array. I'm very impressed by the idea. Mean number of iterations for binsearch on current conversion table with 8000 characters is about 13 and the table size is under 100kBytes (maybe). A three-level array with 2 byte values will take about 1.6~2MB of memory. A radix tree for UTF-8->some-encoding conversion requires about, or up to.. (using 1 byte index to point the next level) (1 * ((7f + 1) + (df - c2 + 1) * (bf - 80 + 1) + (ef - e0 + 1) * (bf - 80 + 1)^2)) = 67 kbytes. SJIS characters are 2byte length at longest so about 8000 characters takes extra 16 k Bytes. And some padding space will be added on them. As the result, radix tree seems to be promising because of small requirement of additional memory and far less comparisons. Also Big5 and other encodings including EUC-* will get benefit from it. Implementing radix tree code, then redefining the format of mapping table to suppot radix tree, then modifying mapping generator script are needed. If no one oppse to this, I'll do that. regards, -- Kyotaro Horiguchi NTT Open Source Software Center -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers