2010/3/29 Henrik Johansen <[email protected]>: > > On Mar 29, 2010, at 11:52 43AM, Nicolas Cellier wrote: > >> 2010/3/29 Henrik Johansen <[email protected]>: >>> >>> On Mar 29, 2010, at 11:16 30AM, Nicolas Cellier wrote: >>> >>>> I presume that under the idiom "latin1" you refer to code page 1252 >>>> rather than iso8859-L1, right ? >>>> >>>> Nicolas >>> Good question :) >>> What IS the presumed internal encoding of Bytestrings in Squeak? >>> That's the one I meant, I merely assumed it was latin1 seeing as how the >>> text converter refers to it as such. >>> Personally I thought it was iso8859-L1, seeing as the bytestring to unicode >>> conversion does a simple shift of chars > 127 to the 0080 - 00FF range. >>> >>> Cheers, >>> Henry >>> >> >> From what I understood, CP1252 is Microsoft "latin1" and use codes 128 to >> 159. >> ISO8859-L1 match fisrt 256 codes of unicode latin-1 and has codes 128 >> to 159 unused. >> You know, when Microsoft "uses" a standard, it's always a better standard ;) >> >> I have nothing against CP1252, it's an optimization which avoid >> wasting 32 cheap codes. >> But I'm not sure about various compatibility issues in/with the >> external world... >> >> Squeak clearly uses CP1252. >> For Pharo, there might be a mix of the two since Sophie-like >> refactorings. Surely what John was refering to. >> >> Nicolas > > Ummm... > All the utf8-converters in squeak use Unicode value:, which maps directly > from charCode 128->255 to Unicode value 128->255. > Unicode value 128->255 IS iso8859-L1, so if squeak uses CP1252 as internal > format, all the converters in Squeak are wrong. > > Cheers, > Henry >
ISO8859-L1 and CP1252 only differ for code points 16r80 to 16r9F. Contrarily to what I said, these code points are assigned to G1 control characters (anyone ever used these ?). See http://en.wikipedia.org/wiki/ISO_8859-1 and http://en.wikipedia.org/wiki/Windows-1252 Now, I'm not so sure anymore why I thought squeak was CP1252. Is it ? My guess was probably based on macToSqueak and squeakToMac implementation. But endering of following snippet isn't CP1252 complying: String withAll: ((16r80 to: 16r9F) collect: [:e | Character value: e]) or (16r80 to: 16r9F) collect: [:e | Character value: e] as: String ' ' In Squeak 4.1 the different fonts don't agree on rendering these characters... DefaultFixedTextStyle is still using MacRoman and display accented characters. DefaultTextStyle hack first 4 entries with caret underscore left arrow and up arrow (probably a Cuis hack) Accu* just seem to have a hack for left arrow Maybe with a bit more clean-up (Character euro is answering the MacRoman code for example, and taking macRoman conversions from Sophie/Pharo), we could declare Squeak is using unicode... Great ! Nicolas > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
