2010/3/29 Nicolas Cellier <[email protected]>: > 2010/3/29 Henrik Johansen <[email protected]>: >> >> On Mar 29, 2010, at 2:00 09PM, Nicolas Cellier wrote: >> >>> 2010/3/29 Henrik Johansen <[email protected]>: >>>> >>>> On Mar 29, 2010, at 11:52 43AM, Nicolas Cellier wrote: >>>> >>>>> 2010/3/29 Henrik Johansen <[email protected]>: >>>>>> >>>>>> On Mar 29, 2010, at 11:16 30AM, Nicolas Cellier wrote: >>>>>> >>>>>>> I presume that under the idiom "latin1" you refer to code page 1252 >>>>>>> rather than iso8859-L1, right ? >>>>>>> >>>>>>> Nicolas >>>>>> Good question :) >>>>>> What IS the presumed internal encoding of Bytestrings in Squeak? >>>>>> That's the one I meant, I merely assumed it was latin1 seeing as how the >>>>>> text converter refers to it as such. >>>>>> Personally I thought it was iso8859-L1, seeing as the bytestring to >>>>>> unicode conversion does a simple shift of chars > 127 to the 0080 - 00FF >>>>>> range. >>>>>> >>>>>> Cheers, >>>>>> Henry >>>>>> >>>>> >>>>> From what I understood, CP1252 is Microsoft "latin1" and use codes 128 to >>>>> 159. >>>>> ISO8859-L1 match fisrt 256 codes of unicode latin-1 and has codes 128 >>>>> to 159 unused. >>>>> You know, when Microsoft "uses" a standard, it's always a better standard >>>>> ;) >>>>> >>>>> I have nothing against CP1252, it's an optimization which avoid >>>>> wasting 32 cheap codes. >>>>> But I'm not sure about various compatibility issues in/with the >>>>> external world... >>>>> >>>>> Squeak clearly uses CP1252. >>>>> For Pharo, there might be a mix of the two since Sophie-like >>>>> refactorings. Surely what John was refering to. >>>>> >>>>> Nicolas >>>> >>>> Ummm... >>>> All the utf8-converters in squeak use Unicode value:, which maps directly >>>> from charCode 128->255 to Unicode value 128->255. >>>> Unicode value 128->255 IS iso8859-L1, so if squeak uses CP1252 as internal >>>> format, all the converters in Squeak are wrong. >>>> >>>> Cheers, >>>> Henry >>>> >>> >>> ISO8859-L1 and CP1252 only differ for code points 16r80 to 16r9F. >>> Contrarily to what I said, these code points are assigned to G1 >>> control characters (anyone ever used these ?). >>> See http://en.wikipedia.org/wiki/ISO_8859-1 and >>> http://en.wikipedia.org/wiki/Windows-1252 >> >> Not to my knowledge :) >> The strong argument for using latin1 as internal charset for ByteString vs >> 1252 is the 1-1 mapping to unicode values. >> >>> >>> Now, I'm not so sure anymore why I thought squeak was CP1252. Is it ? >> Seems ambiguous. >> >>> My guess was probably based on macToSqueak and squeakToMac implementation. >> >> Yes, that does indeed do MacRoman -> 1252 transformation. As does >> MacRomanTextConverter, in Pharo as well... >> Converters assuming different internal encodings, fonts which render a >> charset different from both of them... Fun eh? >> >>> But endering of following snippet isn't CP1252 complying: >>> >>> String withAll: ((16r80 to: 16r9F) collect: [:e | Character value: e]) >>> or >>> (16r80 to: 16r9F) collect: [:e | Character value: e] as: String >>> '•™≠∞≥∑∫Ω√≈…—‘Ÿ⁄∂∆Œ‚„‰ˆ˜˘˙˚˝˛ˇıƒ' >>> > > I intentionnally included the above string in the mail just for the fun of > it... > My gmail/firefox browser originally did display boxed control characters, > Now, in the same browser, I read back some math symbols in your answer... > ... centered dot, Trade mark, different, infinity, greater or equal, > summation etc... > At least, you can see that "conforming to external world rules" might > be pretty difficult > I would add silly too :) >
And gmail now display my original mail with CP1252 interpretation :) M$ friendly ? > >>> In Squeak 4.1 the different fonts don't agree on rendering these >>> characters... >>> DefaultFixedTextStyle is still using MacRoman and display accented >>> characters. >>> DefaultTextStyle hack first 4 entries with caret underscore left arrow >> Yup, Bitmap DejaVu is latin15 (some characters different from latin1, >> amongst them the € ), with 4 extra entries as you mentioned. >>> and up arrow (probably a Cuis hack) >>> Accu* just seem to have a hack for left arrow >> Yeah, they seem to cover... a blend of latin1, latin15 (has euro symbol), >> and something else (square-root :D ). Wee. >> >> Render with a Unicode font, and you get nothing but []'s, which would be the >> correct latin1-rendering of said string. >> >> Which is why I said an encoding property for the StrikeFonts was needed, so >> you can do the proper conversion of internal string charcodes to the >> charcode values the font expects. (Or rather, bitmap offsets) >> This of course means you'd have to come up with a consistent definition of >> what the internal ByteString encoding in Squeak is first, though. >> >> >>> Maybe with a bit more clean-up (Character euro is answering the >>> MacRoman code for example, >> The keyboardinput handling in Squeak does strange things, at least on a >> Mac... >> Alt - § (which gives a euro symbol on my keyboard layout) is read as a >> WideChar with the correct unicode value on Pharo, but as Char 164 in Squeak. >> Alt- 5 (∞) does a similar thing, reads as correct widechar on Pharo, but on >> Squeak turns into char 129. >>> and taking macRoman conversions from >>> Sophie/Pharo), we could declare Squeak is using unicode... >>> Great ! >>> >>> Nicolas >> >> >> That would be my dream as well. >> Or really, I'd settle for any unambiguous definition of what the ByteString >> encoding is. >> "A little more clean-up" may or may not be an understatement though, it >> would involve going through all the converters, all keyboard-input >> processing code (seems to be more stable in Pharo on mac), and all places >> where strings enters/leaves the system. :) >> > > I won't answer following mail, Michael took care of that in Pharo:) > Let's do it in Squeak too. > > Nicolas > >> Cheers, >> Henry >> >> >> _______________________________________________ >> Pharo-project mailing list >> [email protected] >> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
