On Thu, 1 Apr 2010, Michael Rueger wrote:
On 4/1/2010 3:26 PM, Stéphane Ducasse wrote:
Until someone proves me wrong I would say that Pharo is UTF-8 clean :-)
Sure but what does it mean for stef the stupid:
characters/strings are encoded in UTF-8 or optimized -> 127 and then
after
If you could just rwrtie a little paragraph that I understand once :)
I'll try :-)
And I should have written unicode clean, not UTF-8.
So modulo bugs like the one Henrik pointed out Pharo
- keeps all strings in the image in unicode. Either as byte strings for
strings that do not contain any characters larger than 127, WideString
This sounds really inefficient. Did you remove the primitive send from
ByteString >> #at:put:, or the following works, breaking the above
constraint?
(ByteString basicNew: 1) at: 1 put: (Character value: 128)
Levente
otherwise using basically UTF-32 encoding.
- has all en/decoders fixed to do the correct *-encoding to unicode and back
translation
- utilizes the unicode character entry in the input events, so it should be
possible to input all unicode characters on the different keyboards (us,
german, french, russian, etc)
- uses unicode encoding for filenames
- uses unicode encoding for the clipboard
Hope I didn't leave out anything important :-)
You still need to pick the correct en/decoder to interpret file contents
correctly, the system just can't know which encoding the file is in (see e.g.
text edit on the mac, you need to set the proper encoding there as well).
Michael
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project