RE: Transcoding patch

Tom Hughes Sun, 07 Oct 2001 08:53:40 -0700

In message <[EMAIL PROTECTED]>
          Gibbs Tanton - tgibbs <[EMAIL PROTECTED]> wrote:


> This is good, unless someone has objections I'll commit this.  However, we
> also need the ability to do unicode in the assembler (I'll do this later
> today if no one beats me to it), and we need some way to communicate the
> encoding number between the C and the Perl code.

It probably does still need some cleaning up but that can be done
incremently. One of the main things that I wasn't sure about but
forgot to mention in the original message is what we want to do
about malformed strings.

Are we going to assume strings are well formed and go hell for
leather in handling them or do we want to move to the paranoid
end of the spectrum and check everything we do and throw exceptions
when something odd is spotted?

Currently the code does a bit of both - sometimes it checks things
and sometimes it doesn't.

> I guess the question with native strings is will it always be ASCII or will
> it be Shift-JIS etc...?  And the follow up to that is can, for the short
> term, we assume it will be ASCII and then improve our native string
> transcoding over time?

Well according to string.pod native will always be a single byte per
character encoding and never a wide character or shifted encoding so
that rules out Shift-JIS and most other far eastern encodings.

BTW the claim in string.pod that UTF-8 needs a maximum of 3 bytes per
character is wrong, at least if you allow U+0000 to U+10FFFF as your
character space which is what I did - any character over U+FFFF needs
four bytes.

Tom

-- 
Tom Hughes ([EMAIL PROTECTED])
http://www.compton.nu/

RE: Transcoding patch

Reply via email to