we have a database that is storing strings in various encodings (and
non-encodings, namely the arbitrary byte soup that you might see in
email headers from the internet). For this reason, the database uses
sql_ascii encoding. The columns are text, as most characters are
ascii, so bytea didn't seem the right way to go.

Currently we are on 8.3 and try to upgrade to 9.1, but the plperlu
functions we have are acting up.

Old behavior on 8.3 .. 9.0:

sql_ascii =# create or replace function whitespace(text) returns text
language plperlu as $$ $a = shift; $a =~ s/[\t ]+/ /g; return $a; $$;

sql_ascii =# select whitespace (E'\200'); -- 0x80 is not valid utf-8

sql_ascii =# select whitespace (E'\200')::bytea;

New behavior on 9.1.2:

sql_ascii =# select whitespace (E'\200');
ERROR:  XX000: Malformed UTF-8 character (fatal) at line 1.
KONTEXT:  PL/Perl function "whitespace"
ORT:  plperl_call_perl_func, plperl.c:2037

A crude workaround is:

sql_ascii =# create or replace function whitespace_utf8_off(text)
returns text language plperlu as $$ use Encode; $a = shift;
Encode::_utf8_off($a); $a =~ s/[\t ]+/ /g; return $a; $$;

sql_ascii =# select whitespace_utf8_off (E'\200');

sql_ascii =# select whitespace_utf8_off (E'\200')::bytea;

(Note that the workaround is not perfect as the resulting 0x80..0xff
bytes are still tagged to be utf8.)

I think the bug is in plperl_helpers.h:

 * Create a new SV from a string assumed to be in the current database's
 * encoding.

static inline SV *
cstr2sv(const char *str)
        SV                 *sv;
        char       *utf8_str = utf_e2u(str);

        sv = newSVpv(utf8_str, 0);


        return sv;

In sql_ascii databases, utf_e2u does not do any recoding, but then
SvUTF8_on still marks the string as utf-8, while it isn't.

(Returned values might also need fixing.)

In my view, this is clearly a bug in pl/perl on sql_ascii databases.

c...@df7cb.de | http://www.df7cb.de/

Attachment: signature.asc
Description: Digital signature

Reply via email to