On 20.12.2009 20:04, Igor Stasenko wrote: > Hello, > i finished this stuff, and its ready for adoption. > Nice! > See http://bugs.squeak.org/view.php?id=7428 > > Anyone wants to help pushing it into trunk update stream (using MC configs)? > > It works fine on recent trunk image, > on pharo however i had some problems installing changes, because of > some differencies. > > Tried on PharoCore-1.1-11106-ALPHA.image > > phase2.1.cs > - do not filein the TextEditor changes, since pharo-core don't have it. > - do not filein the last line (reorganizing).. > > - tests failing because pharo String class does not implements > #squeakToUtf8 > nor > #utf8ToSqueak > > Do we having an uniform way how to encode ANY String -> ByteString(utf8) > and back? What ANSI standard saying about it? Maybe i'm using wrong methods? > "3.4.6.4 - It is erroneous if stringBody contains any characters that does not exist in the implementation defined execution character set used in the representation of character objects." So, implementation defined. Every internal String (in Squeak and Pharo) (afaik) should be either latin1 (ByteStrings) or + utf32 with the high byte used for differentiation between language of the string.
To me, sending squeakToUtf8, then using StandardFileStream instead of FileStream seems safe. As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in most other cases as well) In fact, it's safer than UTF8Decoder for non-utf8 strings, which does not perform the validity checks (only reads the total #of bytes) when encountering bytes > 127. The reason it seems mostly for internal use (to me) is the fact it silently falls back to assuming string is already in latin1 (ie, the "valid" ByteString format), instead of raising an error like the stream decoder does. (Which, by the way, would be much nicer if was a MalformedUTF8Error or some such...) ws := StandardFileStream newFileNamed: 'test.txt'. "Save as latin1" ws nextPutAll: 'ååå'. ws close. "Read with UTF8Decoder" rs := FileStream oldFileNamed: 'test.txt'. "Print this, gives a ?" rs contents. rs close "Read with Latin1Decoder" rs := StandardFileStream oldFileNamed: 'test.txt'. "Print this, gives ååå. since it's not valid utf8, thus assumes latin1" rs contents utf8ToSqueak. rs close > Still, i think we need this thing standartized and be common for all > dialects (not just Pharo/Squeak). > There's really only one way to store characters in a ByteArray (ie. ByteString) and call it utf8 encoded. As far as I can tell, Squeak seems to do the right thing :) I believe Nicolas pushed for implementation in Pharo some time ago, not sure what happened to that. Cheers, Henry _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
