2009/12/20 Henrik Sperre Johansen <[email protected]>: > On 20.12.2009 20:04, Igor Stasenko wrote: >> Hello, >> i finished this stuff, and its ready for adoption. >> > Nice! >> See http://bugs.squeak.org/view.php?id=7428 >> >> Anyone wants to help pushing it into trunk update stream (using MC configs)? >> >> It works fine on recent trunk image, >> on pharo however i had some problems installing changes, because of >> some differencies. >> >> Tried on PharoCore-1.1-11106-ALPHA.image >> >> phase2.1.cs >> - do not filein the TextEditor changes, since pharo-core don't have it. >> - do not filein the last line (reorganizing).. >> >> - tests failing because pharo String class does not implements >> #squeakToUtf8 >> nor >> #utf8ToSqueak >> >> Do we having an uniform way how to encode ANY String -> ByteString(utf8) >> and back? What ANSI standard saying about it? Maybe i'm using wrong methods? >> > "3.4.6.4 - It is erroneous if stringBody contains any characters that > does not exist in the implementation > defined execution character set used in the representation of character > objects." > So, implementation defined. > Every internal String (in Squeak and Pharo) (afaik) should be either > latin1 (ByteStrings) or + utf32 with the high byte used for > differentiation between language of the string. > > To me, sending squeakToUtf8, then using StandardFileStream instead of > FileStream seems safe. > As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in > most other cases as well) > In fact, it's safer than UTF8Decoder for non-utf8 strings, which does > not perform the validity checks (only reads the total #of bytes) when > encountering bytes > 127. > The reason it seems mostly for internal use (to me) is the fact it > silently falls back to assuming string is already in latin1 (ie, the > "valid" ByteString format), instead of raising an error like the stream > decoder does. (Which, by the way, would be much nicer if was a > MalformedUTF8Error or some such...) > > ws := StandardFileStream newFileNamed: 'test.txt'. > "Save as latin1" > ws nextPutAll: 'ååå'. > ws close. > > "Read with UTF8Decoder" > rs := FileStream oldFileNamed: 'test.txt'. > "Print this, gives a ?" > rs contents. > rs close > > "Read with Latin1Decoder" > rs := StandardFileStream oldFileNamed: 'test.txt'. > "Print this, gives ååå. since it's not valid utf8, thus assumes latin1" > rs contents utf8ToSqueak. > rs close >> Still, i think we need this thing standartized and be common for all >> dialects (not just Pharo/Squeak). >> > There's really only one way to store characters in a ByteArray (ie. > ByteString) and call it utf8 encoded. > As far as I can tell, Squeak seems to do the right thing :) > I believe Nicolas pushed for implementation in Pharo some time ago, not > sure what happened to that. >
I seems solved this by using #convertToEncoding: / #convertFromEncoding: . Tests working fine after that. I didn't tried however to use source with other than Latin1 characters yet. > Cheers, > Henry > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project > -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
