Re: Encode, take two

Jarkko Hietaniemi Wed, 13 Sep 2000 06:33:14 -0700
On Wed, Sep 13, 2000 at 09:21:21AM +0100, Nick Ing-Simmons wrote:
> Jarkko Hietaniemi <[EMAIL PROTECTED]> writes:
> >>    bytes_to_utf8($string, $encoding)
> >>    utf8_to_bytes($string, $encoding)
> >
> >Scratch these.  Bytes are in no encoding.  They are numbers.
> 
> Yeah - but it is only a matter of time before we want
> 
> to take a bunch of Shift-JIS bytes and turn them into perl chars.

Hmmm...

> We will need  
>         nativebytes_to_chars($string,$encoding);
> 
> 
> I still think this indicates API is too "implementation centric" 
> I am worried about perl-code having all these representations
> spelt out. To _me_ the whole point of the UNICODE approach is 
> that we can do anything by 
> 
>         whatever-to-UNICODE, massage, UNICODE-to-wanted.
> 
> I think bytes_to_utf8 is a worrying opposite of that - that says we 
> start with some "binary" bytes that perl cannot use char ops on,

Assume I have a string a bunch of bytes that makes sense in Shift-JIS,
as Shift-JIS characters.  Now, how I am going to get it to Unicode?
chars_to_blah() won't help since they are not yet in Unicode chars.
So yes, I think you are right, we need the bytes_to_utf8(), and
bytes_to_chars() is then a natural convenience wrapper.

> and converts it to another sequence of bytes which again are 
> not perl "chars". If substr() et. al. are going to pull out UTF8
> encoded bytes (as LDAP needs) then perl cannot say /[:alpha:]/.

-- 
$jhi++; # http://www.iki.fi/jhi/
        # There is this special biologist word we use for 'stable'.
        # It is 'dead'. -- Jack Cohen
Re: Encode, take two

Reply via email to