2009/12/20 Henrik Sperre Johansen <[email protected]>:
> On 20.12.2009 22:07, Igor Stasenko wrote:
>> 2009/12/20 Henrik Sperre Johansen<[email protected]>:
>>
>>> On 20.12.2009 20:04, Igor Stasenko wrote:
>>>
>>>> Hello,
>>>> i finished this stuff, and its ready for adoption.
>>>>
>>>>
>>> Nice!
>>>
>>>> See http://bugs.squeak.org/view.php?id=7428
>>>>
>>>> Anyone wants to help pushing it into trunk update stream (using MC 
>>>> configs)?
>>>>
>>>> It works fine on recent trunk image,
>>>> on pharo however i had some problems installing changes, because of
>>>> some differencies.
>>>>
>>>> Tried on PharoCore-1.1-11106-ALPHA.image
>>>>
>>>> phase2.1.cs
>>>> - do not filein the TextEditor changes, since pharo-core don't have it.
>>>> - do not filein the last line (reorganizing)..
>>>>
>>>> - tests failing because pharo String class does not implements
>>>> #squeakToUtf8
>>>> nor
>>>> #utf8ToSqueak
>>>>
>>>> Do we having an uniform way how to encode  ANY String ->    
>>>> ByteString(utf8)
>>>> and back? What ANSI standard saying about it? Maybe i'm using wrong 
>>>> methods?
>>>>
>>>>
>>> "3.4.6.4 - It is erroneous if stringBody contains any characters that
>>> does not exist in the implementation
>>> defined execution character set used in the representation of character
>>> objects."
>>> So, implementation defined.
>>> Every internal String (in Squeak and Pharo) (afaik) should be either
>>> latin1 (ByteStrings) or + utf32 with the high byte used for
>>> differentiation between language of the string.
>>>
>>> To me, sending squeakToUtf8, then using StandardFileStream instead of
>>> FileStream seems safe.
>>> As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in
>>> most other cases as well)
>>> In fact, it's safer than UTF8Decoder for non-utf8 strings, which does
>>> not perform the validity checks (only reads the total #of bytes) when
>>> encountering bytes>  127.
>>> The reason it seems mostly for internal use (to me) is the fact it
>>> silently falls back to assuming string is already in latin1 (ie, the
>>> "valid" ByteString format), instead of raising an error like the stream
>>> decoder does. (Which, by the way, would be much nicer if was a
>>> MalformedUTF8Error or some such...)
>>>
>>> ws := StandardFileStream newFileNamed: 'test.txt'.
>>> "Save as latin1"
>>> ws nextPutAll: 'ååå'.
>>> ws close.
>>>
>>> "Read with UTF8Decoder"
>>> rs := FileStream oldFileNamed: 'test.txt'.
>>> "Print this, gives a ?"
>>> rs contents.
>>> rs close
>>>
>>> "Read with Latin1Decoder"
>>> rs := StandardFileStream oldFileNamed: 'test.txt'.
>>> "Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
>>> rs contents utf8ToSqueak.
>>> rs close
>>>
>>>> Still, i think we need this thing standartized and be common for all
>>>> dialects (not just Pharo/Squeak).
>>>>
>>>>
>>> There's really only one way to store characters in a ByteArray (ie.
>>> ByteString) and call it utf8 encoded.
>>> As far as I can tell, Squeak seems to do the right thing :)
>>> I believe Nicolas pushed for implementation in Pharo some time ago, not
>>> sure what happened to that.
>>>
>>>
>> I seems solved this by using #convertToEncoding: / #convertFromEncoding: .
>> Tests working fine after that. I didn't tried however to use source
>> with other than Latin1 characters yet.
>>
> Converting to utf8 from ByteString/WideString should not be a problem,
> as long as you know the ByteString encoding is latin1. (Which it should
> if created it by any normal means)
> As long as you are SURE the string you are decoding is utf8 (like, when
> you've encoded them all yourself ;) ), convertFromEncoding: shouldn't be
> a problem either. (See previous mail, it's the same as used by
> FileStream, so lacks the validity checks).
>
Ok, thanks for clarification.

I'm also found other places in Pharo where its using a #( 0 0 0 0)
as trailer in
addTraitSelector: aSymbol withMethod: aCompiledMethod

it needs to be fixed (as well as all other places which trying to use
arrays for defining a trailer).


> Cheers,
> Henry
>
>
> _______________________________________________
> Pharo-project mailing list
> [email protected]
> http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project



-- 
Best regards,
Igor Stasenko AKA sig.

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to