2010/3/30 Norbert Hartl <[email protected]>:
>
> On 29.03.2010, at 11:52, Nicolas Cellier wrote:
>
> 2010/3/29 Henrik Johansen <[email protected]>:
>
> On Mar 29, 2010, at 11:16 30AM, Nicolas Cellier wrote:
>
> I presume that under the idiom "latin1" you refer to code page 1252
>
> rather than iso8859-L1, right ?
>
> Nicolas
>
> Good question :)
>
> What IS the presumed internal encoding of Bytestrings in Squeak?
>
> That's the one I meant, I merely assumed it was latin1 seeing as how the
> text converter refers to it as such.
>
> Personally I thought it was iso8859-L1, seeing as the bytestring to unicode
> conversion does a simple shift of chars > 127 to the 0080 - 00FF range.
>
> Cheers,
>
> Henry
>
>
> From what I understood, CP1252 is Microsoft "latin1" and use codes 128 to
> 159.
> ISO8859-L1 match fisrt 256 codes of unicode latin-1 and has codes 128
> to 159 unused.
> You know, when Microsoft "uses" a standard, it's always a better standard ;)
>
> I have nothing against CP1252, it's an optimization which avoid
> wasting 32 cheap codes.
> But I'm not sure about various compatibility issues in/with the
> external world...
>
> If you know how to easily assure that
> (String with: (Character value: (Integer readFrom: '20AC' base: 16)))
> = (String with: (Character value: (Integer readFrom: '80' base: 16)))
> than you might be safe. By using Windows-1252 code points aren't unique
> anymore. Every code point in the range 0x80 - 0x9F exists somewhere else,
> too. So my estimation would be that it will cause more trouble than it might
> solve.
>

Agree.
I see two different problems here:
1) absence of explicit encoding information in external data
2) existence of a canonical representation which can be easily compared...

Generalization of UTF8 should solve 1 (slowly with lot of inertia),
then we can simply assume implicit=UTF8.
Unicode could solve 2...
...Well, as long as diacriticals are ignored.
To me Unicode still has problems with:
 (String with: 16r61 asCharacter with: 16r0302 asCharacter) = (String
with: 16rE2 asCharacter)

Nicolas

> Squeak clearly uses CP1252.
> For Pharo, there might be a mix of the two since Sophie-like
> refactorings. Surely what John was refering to.
>
> In pharo the 20AC string gives me a euro sign but the 80 hex one prints a
> rectangle which is _a_ interpretation of '?' ;)
> Norbert
>
> _______________________________________________
>
> Pharo-project mailing list
>
> [email protected]
>
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to