On Apr 1, 2010, at 3:47 02PM, Michael Rueger wrote: > On 4/1/2010 3:26 PM, Stéphane Ducasse wrote: > >>> Until someone proves me wrong I would say that Pharo is UTF-8 clean :-) >> >> Sure but what does it mean for stef the stupid: >> characters/strings are encoded in UTF-8 or optimized -> 127 and then >> after >> If you could just rwrtie a little paragraph that I understand once :) > > I'll try :-) > > And I should have written unicode clean, not UTF-8. > > So modulo bugs like the one Henrik pointed out Pharo > - keeps all strings in the image in unicode. Either as byte strings for > strings that do not contain any characters larger than 127, WideString > otherwise using basically UTF-32 encoding. > - has all en/decoders fixed to do the correct *-encoding to unicode and back > translation > - utilizes the unicode character entry in the input events, so it should be > possible to input all unicode characters on the different keyboards (us, > german, french, russian, etc) > - uses unicode encoding for filenames > - uses unicode encoding for the clipboard > > Hope I didn't leave out anything important :-)
I'd like to add "Other means of importing/exporting code", ie. for mcz., .st, .cs, reading/writing logic. This is the place to me there seems to still be shady areas in Pharo. Mostly because they seem to assume different encodings for non-utf8 readable input. Also, I don't really mind we keep some strings as latin1 when possible, as they tend to be a lot faster to process, and the conversion between them and WideStrings is trivial, and as far as I can tell, reliable. Cheers, Henry PS. The other issues with rendering absolutely apply to Pharo, but not quite related to string encoding per se. _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
