Re: [HACKERS] invalidly encoded strings

Alvaro Herrera Tue, 11 Sep 2007 12:28:05 -0700

Tom Lane wrote:
> Jeff Davis <[EMAIL PROTECTED]> writes:
> > On Mon, 2007-09-10 at 23:20 -0400, Tom Lane wrote:
> >> It might work the way you are expecting if the database uses SQL_ASCII
> >> encoding and C locale --- and I'd be fine with allowing convert() only
> >> when the database encoding is SQL_ASCII.
> 
> > I prefer this option.
> 
> I think really the technically cleanest solution would be to make
> convert() return bytea instead of text; then we'd not have to put
> restrictions on what encoding or locale it's working inside of.
> However, it's not clear to me whether there are valid usages that
> that would foreclose.  Tatsuo mentioned length() but bytea has that.


But length(bytea) cannot count characters, only bytes.

> What I think we'd need to have a complete solution is
> 
> convert(text, name) returns bytea
>       -- convert from DB encoding to arbitrary encoding
> 
> convert(bytea, name, name) returns bytea
>       -- convert between any two encodings
> 
> convert(bytea, name) returns text
>       -- convert from arbitrary encoding to DB encoding

That seems good.  This is the encode/decode that other systems have.

However ISTM we would also need something like

length(bytea, name) returns int
        -- counts the number of characters assuming that the bytea is in
        -- the given encoding

Hmm, I wonder if counting chars is consistent regardless of the
encoding the string is in.  To me it sounds like it should, in which
case it works to convert to the DB encoding and count chars there.

-- 
Alvaro Herrera       Valdivia, Chile   ICBM: S 39º 49' 18.1", W 73º 13' 56.4"
<inflex> really, I see PHP as like a strange amalgamation of C, Perl, Shell
<crab> inflex: you know that "amalgam" means "mixture with mercury",
       more or less, right?
<crab> i.e., "deadly poison"

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] invalidly encoded strings

Reply via email to