Re: [Pharo-project] squeakToUTF-8 and related?

Levente Uzonyi Sat, 03 Apr 2010 04:09:19 -0700

On Thu, 1 Apr 2010, Michael Rueger wrote:

On 4/1/2010 3:26 PM, Stéphane Ducasse wrote:
Until someone proves me wrong I would say that Pharo is UTF-8 clean :-)
Sure but what does it mean for stef the stupid:
characters/strings are encoded in UTF-8 or optimized -> 127 and thenafter
If you could just rwrtie a little paragraph that I understand once :)
I'll try :-)

And I should have written unicode clean, not UTF-8.

So modulo bugs like the one Henrik pointed out Pharo
- keeps all strings in the image in unicode. Either as byte strings forstrings that do not contain any characters larger than 127, WideString

This sounds really inefficient. Did you remove the primitive send fromByteString >> #at:put:, or the following works, breaking the aboveconstraint?


(ByteString basicNew: 1) at: 1 put: (Character value: 128)


Levente

otherwise using basically UTF-32 encoding.
- has all en/decoders fixed to do the correct *-encoding to unicode and backtranslation- utilizes the unicode character entry in the input events, so it should bepossible to input all unicode characters on the different keyboards (us,german, french, russian, etc)
- uses unicode encoding for filenames
- uses unicode encoding for the clipboard

Hope I didn't leave out anything important :-)
You still need to pick the correct en/decoder to interpret file contentscorrectly, the system just can't know which encoding the file is in (see e.g.text edit on the mac, you need to set the proper encoding there as well).
Michael

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Re: [Pharo-project] squeakToUTF-8 and related?

Reply via email to