Re: [Pharo-project] squeakToUTF-8 and related?

Henrik Johansen Mon, 29 Mar 2010 03:20:26 -0700

On Mar 29, 2010, at 11:52 43AM, Nicolas Cellier wrote:

> 2010/3/29 Henrik Johansen <[email protected]>:
>> 
>> On Mar 29, 2010, at 11:16 30AM, Nicolas Cellier wrote:
>> 
>>> I presume that under the idiom "latin1" you refer to code page 1252
>>> rather than iso8859-L1, right ?
>>> 
>>> Nicolas
>> Good question :)
>> What IS the presumed internal encoding of Bytestrings in Squeak?
>> That's the one I meant, I merely assumed it was latin1 seeing as how the 
>> text converter refers to it as such.
>> Personally I thought it was iso8859-L1, seeing as the bytestring to unicode 
>> conversion does a simple shift of chars > 127 to the 0080 - 00FF range.
>> 
>> Cheers,
>> Henry
>> 
> 
> From what I understood, CP1252 is Microsoft "latin1" and use codes 128 to 159.
> ISO8859-L1 match fisrt 256 codes of unicode latin-1 and has codes 128
> to 159 unused.
> You know, when Microsoft "uses" a standard, it's always a better standard ;)
> 
> I have nothing against CP1252, it's an optimization which avoid
> wasting 32 cheap codes.
> But I'm not sure about various compatibility issues in/with the
> external world...
> 
> Squeak clearly uses CP1252.
> For Pharo, there might be a mix of the two since Sophie-like
> refactorings. Surely what John was refering to.
> 
> Nicolas


Ummm...
All the utf8-converters in squeak use Unicode value:, which maps directly from 
charCode 128->255 to Unicode value 128->255.
Unicode value 128->255 IS iso8859-L1, so if squeak uses CP1252 as internal 
format, all the converters in Squeak are wrong.

Cheers,
Henry


_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] squeakToUTF-8 and related?

Reply via email to