On Apr 1, 2010, at 3:47 02PM, Michael Rueger wrote:

> On 4/1/2010 3:26 PM, Stéphane Ducasse wrote:
> 
>>> Until someone proves me wrong I would say that Pharo is UTF-8 clean :-)
>> 
>> Sure but what does it mean for stef the stupid:
>>      characters/strings are encoded in UTF-8 or optimized ->  127 and then 
>> after
>> If you could just rwrtie a little paragraph that I understand once :)
> 
> I'll try :-)
> 
> And I should have written unicode clean, not UTF-8.
> 
> So modulo bugs like the one Henrik pointed out Pharo
> - keeps all strings in the image in unicode. Either as byte strings for 
> strings that do not contain any characters larger than 127, WideString 
> otherwise using basically UTF-32 encoding.
> - has all en/decoders fixed to do the correct *-encoding to unicode and back 
> translation
> - utilizes the unicode character entry in the input events, so it should be 
> possible to input all unicode characters on the different keyboards (us, 
> german, french, russian, etc)
> - uses unicode encoding for filenames
> - uses unicode encoding for the clipboard
> 
> Hope I didn't leave out anything important :-)

I'd like to add
"Other means of importing/exporting code", ie. for mcz., .st, .cs, 
reading/writing logic. 
This is the place to me there seems to still be shady areas in Pharo. Mostly 
because they seem to assume different encodings for non-utf8 readable input. 

Also, I don't really mind we keep some strings as latin1 when possible, as they 
tend to be a lot faster to process, and the conversion between them and 
WideStrings is trivial, and as far as I can tell, reliable.

Cheers,
Henry

PS. The other issues with rendering absolutely apply to Pharo, but not quite 
related to string encoding per se.
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to