Mark Dilger wrote:
Andrew - Supernews wrote:
On 2007-04-01, Mark Dilger <[EMAIL PROTECTED]> wrote:
Do any of the string functions (see
http://www.postgresql.org/docs/8.2/interactive/functions-string.html)
run the risk of generating invalid utf8 encoded strings? Do I need
to add checks?
Are there known bugs with these functions in this regard?
The chr() function returns an octet, rather than a character; this is
clearly
wrong and needs fixing.
Ok, I've altered the chr() function. I am including a transcript from
psql below. There are several design concerns:
1) In the current implementation, chr(0) returns a 5-byte text object
(4-bytes of overhead plus one byte of data) containing a null. In the
new implementation, this returns an error. I don't know, but it is
possible that people currently use things like "SELECT chr(0) || chr(0)
|| ..." to build up strings of nulls.
2) Under utf8, chr(X) fails for X = 128..255. This may also break
current users expectations.
3) The implicit modulus operation that was being performed by chr() is
now gone, which might break some users.
4) You can't represent the high end of the astral plain with type
INTEGER, unless you pass in a negative value, which is somewhat
unintuitive. Since chr() expects an integer (and not a bigint) the user
needs handle the sign bit correctly.
mark
---------------------
Welcome to psql 8.3devel, the PostgreSQL interactive terminal.
Type: \copyright for distribution terms
\h for help with SQL commands
\? for help with psql commands
\g or terminate with semicolon to execute query
\q to quit
pgsql=# select chr(0);
ERROR: character 0x00 of encoding "SQL_ASCII" has no equivalent in "UTF8"
pgsql=# select chr(65);
chr
-----
A
(1 row)
pgsql=# select chr(128);
ERROR: character 0x80 of encoding "SQL_ASCII" has no equivalent in "UTF8"
pgsql=# select chr(53398);
chr
-----
Ж
(1 row)
pgsql=# select chr(14989485);
chr
-----
中
(1 row)
pgsql=# select chr(4036005254);
ERROR: function chr(bigint) does not exist
LINE 1: select chr(4036005254);
^
HINT: No function matches the given name and argument types. You might
need to add explicit type casts.
Here's the code for the new chr() function:
Datum
chr(PG_FUNCTION_ARGS)
{
int32 cvalue = PG_GETARG_INT32(0);
text *result;
if (pg_database_encoding_max_length() > 1 && !lc_ctype_is_c())
{
int encoding,
len,
byteoff;
uint32 buf[2];
const char *bufptr;
encoding = GetDatabaseEncoding();
buf[0] = htonl(cvalue);
buf[1] = 0;
bufptr = (const char *)&buf;
for (byteoff = 0; byteoff < sizeof(uint32) && 0 == *bufptr; ++byteoff,
++bufptr);
len = pg_encoding_mblen(encoding,bufptr);
if (byteoff + len != sizeof(uint32) || !pg_verify_mbstr(encoding,
bufptr, len, true /* noError */))
report_untranslatable_char(PG_SQL_ASCII, encoding, bufptr,
sizeof(int32));
result = (text *) palloc(VARHDRSZ + len);
SET_VARSIZE(result, VARHDRSZ + len);
memcpy(VARDATA(result),bufptr,len);
}
else
{
result = (text *) palloc(VARHDRSZ + 1);
SET_VARSIZE(result, VARHDRSZ + 1);
*VARDATA(result) = (char) cvalue;
}
PG_RETURN_TEXT_P(result);
}
---------------------------(end of broadcast)---------------------------
TIP 7: You can help support the PostgreSQL project by donating at
http://www.postgresql.org/about/donate