2009/12/20 Henrik Sperre Johansen <[email protected]>:
> On 20.12.2009 20:04, Igor Stasenko wrote:
>> Hello,
>> i finished this stuff, and its ready for adoption.
>>
> Nice!
>> See http://bugs.squeak.org/view.php?id=7428
>>
>> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>>
>> It works fine on recent trunk image,
>> on pharo however i had some problems installing changes, because of
>> some differencies.
>>
>> Tried on PharoCore-1.1-11106-ALPHA.image
>>
>> phase2.1.cs
>> - do not filein the TextEditor changes, since pharo-core don't have it.
>> - do not filein the last line (reorganizing)..
>>
>> - tests failing because pharo String class does not implements
>> #squeakToUtf8
>> nor
>> #utf8ToSqueak
>>
>> Do we having an uniform way how to encode  ANY String ->  ByteString(utf8)
>> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>>
> "3.4.6.4 - It is erroneous if stringBody contains any characters that
> does not exist in the implementation
> defined execution character set used in the representation of character
> objects."
> So, implementation defined.
> Every internal String (in Squeak and Pharo) (afaik) should be either
> latin1 (ByteStrings) or + utf32 with the high byte used for
> differentiation between language of the string.
>
> To me, sending squeakToUtf8, then using StandardFileStream instead of
> FileStream seems safe.
> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
> most other cases as well)
> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
> not perform the validity checks (only reads the total #of bytes) when
> encountering bytes > 127.
> The reason it seems mostly for internal use (to me) is the fact it
> silently falls back to assuming string is already in latin1 (ie, the
> "valid" ByteString format), instead of raising an error like the stream
> decoder does. (Which, by the way, would be much nicer if was a
> MalformedUTF8Error or some such...)
>
> ws := StandardFileStream newFileNamed: 'test.txt'.
> "Save as latin1"
> ws nextPutAll: 'ååå'.
> ws close.
>
> "Read with UTF8Decoder"
> rs := FileStream oldFileNamed: 'test.txt'.
> "Print this, gives a ?"
> rs contents.
> rs close
>
> "Read with Latin1Decoder"
> rs := StandardFileStream oldFileNamed: 'test.txt'.
> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
> rs contents utf8ToSqueak.
> rs close
>> Still, i think we need this thing standartized and be common for all
>> dialects (not just Pharo/Squeak).
>>
> There's really only one way to store characters in a ByteArray (ie.
> ByteString) and call it utf8 encoded.
> As far as I can tell, Squeak seems to do the right thing :)
> I believe Nicolas pushed for implementation in Pharo some time ago, not
> sure what happened to that.
>

I seems solved this by using #convertToEncoding: / #convertFromEncoding: .
Tests working fine after that. I didn't tried however to use source
with other than Latin1 characters yet.

> Cheers,
> Henry
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project
>



-- 
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to