[HACKERS] What is the maximum encoding-conversion growth rate, anyway?

Tom Lane Mon, 28 May 2007 09:57:19 -0700

I just rearranged the code in mbutils.c a little bit to make it more
robust if conversion of an over-length string is attempted, and noted
this comment:


/*
 * When converting strings between different encodings, we assume that space
 * for converted result is 4-to-1 growth in the worst case. The rate for
 * currently supported encoding pairs are within 3 (SJIS JIS X0201 half width
 * kanna -> UTF8 is the worst case).  So "4" should be enough for the moment.
 *
 * Note that this is not the same as the maximum character width in any
 * particular encoding.
 */
#define MAX_CONVERSION_GROWTH  4

It strikes me that this is overly pessimistic, since we do not support
5- or 6-byte UTF8 characters, and AFAICS there are no 1-byte characters
in any supported encoding that require 4 bytes in another.  Could we
reduce the multiplier to 3?  Or even 2?  This has a direct impact on the
longest COPY lines we can support, so I'd like it not to be larger than
necessary.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Have you searched our list archives?

               http://archives.postgresql.org

[HACKERS] What is the maximum encoding-conversion growth rate, anyway?

Reply via email to