2010/3/29 Nicolas Cellier <[email protected]>:
> 2010/3/29 Henrik Johansen <[email protected]>:
>>
>> On Mar 29, 2010, at 2:00 09PM, Nicolas Cellier wrote:
>>
>>> 2010/3/29 Henrik Johansen <[email protected]>:
>>>>
>>>> On Mar 29, 2010, at 11:52 43AM, Nicolas Cellier wrote:
>>>>
>>>>> 2010/3/29 Henrik Johansen <[email protected]>:
>>>>>>
>>>>>> On Mar 29, 2010, at 11:16 30AM, Nicolas Cellier wrote:
>>>>>>
>>>>>>> I presume that under the idiom "latin1" you refer to code page 1252
>>>>>>> rather than iso8859-L1, right ?
>>>>>>>
>>>>>>> Nicolas
>>>>>> Good question :)
>>>>>> What IS the presumed internal encoding of Bytestrings in Squeak?
>>>>>> That's the one I meant, I merely assumed it was latin1 seeing as how the 
>>>>>> text converter refers to it as such.
>>>>>> Personally I thought it was iso8859-L1, seeing as the bytestring to 
>>>>>> unicode conversion does a simple shift of chars > 127 to the 0080 - 00FF 
>>>>>> range.
>>>>>>
>>>>>> Cheers,
>>>>>> Henry
>>>>>>
>>>>>
>>>>> From what I understood, CP1252 is Microsoft "latin1" and use codes 128 to 
>>>>> 159.
>>>>> ISO8859-L1 match fisrt 256 codes of unicode latin-1 and has codes 128
>>>>> to 159 unused.
>>>>> You know, when Microsoft "uses" a standard, it's always a better standard 
>>>>> ;)
>>>>>
>>>>> I have nothing against CP1252, it's an optimization which avoid
>>>>> wasting 32 cheap codes.
>>>>> But I'm not sure about various compatibility issues in/with the
>>>>> external world...
>>>>>
>>>>> Squeak clearly uses CP1252.
>>>>> For Pharo, there might be a mix of the two since Sophie-like
>>>>> refactorings. Surely what John was refering to.
>>>>>
>>>>> Nicolas
>>>>
>>>> Ummm...
>>>> All the utf8-converters in squeak use Unicode value:, which maps directly 
>>>> from charCode 128->255 to Unicode value 128->255.
>>>> Unicode value 128->255 IS iso8859-L1, so if squeak uses CP1252 as internal 
>>>> format, all the converters in Squeak are wrong.
>>>>
>>>> Cheers,
>>>> Henry
>>>>
>>>
>>> ISO8859-L1 and CP1252 only differ for code points 16r80 to 16r9F.
>>> Contrarily to what I said, these code points are assigned to G1
>>> control characters (anyone ever used these ?).
>>> See http://en.wikipedia.org/wiki/ISO_8859-1 and
>>> http://en.wikipedia.org/wiki/Windows-1252
>>
>> Not to my knowledge :)
>> The strong argument for using latin1 as internal charset for ByteString vs 
>> 1252 is the 1-1 mapping to unicode values.
>>
>>>
>>> Now, I'm not so sure anymore why I thought squeak was CP1252. Is it ?
>> Seems ambiguous.
>>
>>> My guess was probably based on macToSqueak and squeakToMac implementation.
>>
>> Yes, that does indeed do MacRoman -> 1252 transformation. As does 
>> MacRomanTextConverter, in Pharo as well...
>> Converters assuming different internal encodings, fonts which render a 
>> charset different from both of them... Fun eh?
>>
>>> But endering of following snippet isn't CP1252 complying:
>>>
>>> String withAll: ((16r80 to: 16r9F) collect: [:e | Character value: e])
>>> or
>>> (16r80 to: 16r9F) collect: [:e | Character value: e] as: String
>>> '•™≠∞≥∑∫Ω√≈…—‘Ÿ⁄∂∆Œ‚„‰ˆ˜˘˙˚˝˛ˇıƒ'
>>>
>
> I intentionnally included the above string in the mail just for the fun of 
> it...
> My gmail/firefox browser originally did display boxed control characters,
> Now, in the same browser, I read back some math symbols in your answer...
> ... centered dot, Trade mark, different, infinity, greater or equal,
> summation etc...
> At least, you can see that "conforming to external world rules" might
> be pretty difficult
> I would add silly too :)
>

And gmail now display my original mail with CP1252 interpretation :)
M$ friendly ?

>
>>> In Squeak 4.1 the different fonts don't agree on rendering these 
>>> characters...
>>> DefaultFixedTextStyle is still using MacRoman and display accented 
>>> characters.
>>> DefaultTextStyle hack first 4 entries with caret underscore left arrow
>> Yup, Bitmap DejaVu is latin15 (some characters different from latin1, 
>> amongst them the € ), with 4 extra entries as you mentioned.
>>> and up arrow (probably a Cuis hack)
>>> Accu* just seem to have a hack for left arrow
>> Yeah, they seem to cover... a blend of latin1, latin15 (has euro symbol), 
>> and something else (square-root :D ). Wee.
>>
>> Render with a Unicode font, and you get nothing but []'s, which would be the 
>> correct latin1-rendering of said string.
>>
>> Which is why I said an encoding property for the StrikeFonts was needed, so 
>> you can do the proper conversion of internal string charcodes to the 
>> charcode values the font expects. (Or rather, bitmap offsets)
>> This of course means you'd have to come up with a  consistent definition of 
>> what the internal ByteString encoding in Squeak is first, though.
>>
>>
>>> Maybe with a bit more clean-up (Character euro is answering the
>>> MacRoman code for example,
>> The keyboardinput handling in Squeak does strange things, at least on a 
>> Mac...
>> Alt - § (which gives a euro symbol on my keyboard layout) is read as a 
>> WideChar with the correct unicode value on Pharo, but as Char 164 in Squeak.
>> Alt- 5 (∞) does a similar thing, reads as correct widechar on Pharo, but on 
>> Squeak turns into char 129.
>>> and taking macRoman conversions from
>>> Sophie/Pharo), we could declare Squeak is using unicode...
>>> Great !
>>>
>>> Nicolas
>>
>>
>> That would be my dream as well.
>> Or really, I'd settle for any unambiguous definition of what the ByteString 
>> encoding is.
>> "A little more clean-up" may or may not be an understatement  though, it 
>> would involve going through all the converters, all keyboard-input 
>> processing code (seems to be more stable in Pharo on mac), and all places 
>> where strings enters/leaves the system. :)
>>
>
> I won't answer following mail, Michael took care of that in Pharo:)
> Let's do it in Squeak too.
>
> Nicolas
>
>> Cheers,
>> Henry
>>
>>
>> _______________________________________________
>> Pharo-project mailing list
>> [email protected]
>> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to