2009/12/20 Henrik Sperre Johansen <[email protected]>: > On 20.12.2009 22:07, Igor Stasenko wrote: >> 2009/12/20 Henrik Sperre Johansen<[email protected]>: >> >>> On 20.12.2009 20:04, Igor Stasenko wrote: >>> >>>> Hello, >>>> i finished this stuff, and its ready for adoption. >>>> >>>> >>> Nice! >>> >>>> See http://bugs.squeak.org/view.php?id=7428 >>>> >>>> Anyone wants to help pushing it into trunk update stream (using MC >>>> configs)? >>>> >>>> It works fine on recent trunk image, >>>> on pharo however i had some problems installing changes, because of >>>> some differencies. >>>> >>>> Tried on PharoCore-1.1-11106-ALPHA.image >>>> >>>> phase2.1.cs >>>> - do not filein the TextEditor changes, since pharo-core don't have it. >>>> - do not filein the last line (reorganizing).. >>>> >>>> - tests failing because pharo String class does not implements >>>> #squeakToUtf8 >>>> nor >>>> #utf8ToSqueak >>>> >>>> Do we having an uniform way how to encode ANY String -> >>>> ByteString(utf8) >>>> and back? What ANSI standard saying about it? Maybe i'm using wrong >>>> methods? >>>> >>>> >>> "3.4.6.4 - It is erroneous if stringBody contains any characters that >>> does not exist in the implementation >>> defined execution character set used in the representation of character >>> objects." >>> So, implementation defined. >>> Every internal String (in Squeak and Pharo) (afaik) should be either >>> latin1 (ByteStrings) or + utf32 with the high byte used for >>> differentiation between language of the string. >>> >>> To me, sending squeakToUtf8, then using StandardFileStream instead of >>> FileStream seems safe. >>> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in >>> most other cases as well) >>> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does >>> not perform the validity checks (only reads the total #of bytes) when >>> encountering bytes> 127. >>> The reason it seems mostly for internal use (to me) is the fact it >>> silently falls back to assuming string is already in latin1 (ie, the >>> "valid" ByteString format), instead of raising an error like the stream >>> decoder does. (Which, by the way, would be much nicer if was a >>> MalformedUTF8Error or some such...) >>> >>> ws := StandardFileStream newFileNamed: 'test.txt'. >>> "Save as latin1" >>> ws nextPutAll: 'ååå'. >>> ws close. >>> >>> "Read with UTF8Decoder" >>> rs := FileStream oldFileNamed: 'test.txt'. >>> "Print this, gives a ?" >>> rs contents. >>> rs close >>> >>> "Read with Latin1Decoder" >>> rs := StandardFileStream oldFileNamed: 'test.txt'. >>> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1" >>> rs contents utf8ToSqueak. >>> rs close >>> >>>> Still, i think we need this thing standartized and be common for all >>>> dialects (not just Pharo/Squeak). >>>> >>>> >>> There's really only one way to store characters in a ByteArray (ie. >>> ByteString) and call it utf8 encoded. >>> As far as I can tell, Squeak seems to do the right thing :) >>> I believe Nicolas pushed for implementation in Pharo some time ago, not >>> sure what happened to that. >>> >>> >> I seems solved this by using #convertToEncoding: / #convertFromEncoding: . >> Tests working fine after that. I didn't tried however to use source >> with other than Latin1 characters yet. >> > Converting to utf8 from ByteString/WideString should not be a problem, > as long as you know the ByteString encoding is latin1. (Which it should > if created it by any normal means) > As long as you are SURE the string you are decoding is utf8 (like, when > you've encoded them all yourself ;) ), convertFromEncoding: shouldn't be > a problem either. (See previous mail, it's the same as used by > FileStream, so lacks the validity checks). > Ok, thanks for clarification.
I'm also found other places in Pharo where its using a #( 0 0 0 0) as trailer in addTraitSelector: aSymbol withMethod: aCompiledMethod it needs to be fixed (as well as all other places which trying to use arrays for defining a trailer). > Cheers, > Henry > > > _______________________________________________ > Pharo-project mailing list > [email protected] > http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project -- Best regards, Igor Stasenko AKA sig. _______________________________________________ Pharo-project mailing list [email protected] http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
