On 20.12.2009 20:04, Igor Stasenko wrote:
> Hello,
> i finished this stuff, and its ready for adoption.
>    
Nice!
> See http://bugs.squeak.org/view.php?id=7428
>
> Anyone wants to help pushing it into trunk update stream (using MC configs)?
>
> It works fine on recent trunk image,
> on pharo however i had some problems installing changes, because of
> some differencies.
>
> Tried on PharoCore-1.1-11106-ALPHA.image
>
> phase2.1.cs
> - do not filein the TextEditor changes, since pharo-core don't have it.
> - do not filein the last line (reorganizing)..
>
> - tests failing because pharo String class does not implements
> #squeakToUtf8
> nor
> #utf8ToSqueak
>
> Do we having an uniform way how to encode  ANY String ->  ByteString(utf8)
> and back? What ANSI standard saying about it? Maybe i'm using wrong methods?
>    
"3.4.6.4 - It is erroneous if stringBody contains any characters that 
does not exist in the implementation
defined execution character set used in the representation of character 
objects."
So, implementation defined.
Every internal String (in Squeak and Pharo) (afaik) should be either 
latin1 (ByteStrings) or + utf32 with the high byte used for 
differentiation between language of the string.

To me, sending squeakToUtf8, then using StandardFileStream instead of 
FileStream seems safe.
As long as the ByteString's bytes is utf8, utf8ToSqueak works. (And in 
most other cases as well)
In fact, it's safer than UTF8Decoder for non-utf8 strings, which does 
not perform the validity checks (only reads the total #of bytes) when 
encountering bytes > 127.
The reason it seems mostly for internal use (to me) is the fact it 
silently falls back to assuming string is already in latin1 (ie, the 
"valid" ByteString format), instead of raising an error like the stream 
decoder does. (Which, by the way, would be much nicer if was a 
MalformedUTF8Error or some such...)

ws := StandardFileStream newFileNamed: 'test.txt'.
"Save as latin1"
ws nextPutAll: 'ååå'.
ws close.

"Read with UTF8Decoder"
rs := FileStream oldFileNamed: 'test.txt'.
"Print this, gives a ?"
rs contents.
rs close

"Read with Latin1Decoder"
rs := StandardFileStream oldFileNamed: 'test.txt'.
"Print this, gives ååå. since it's not valid utf8, thus assumes latin1"
rs contents utf8ToSqueak.
rs close
> Still, i think we need this thing standartized and be common for all
> dialects (not just Pharo/Squeak).
>    
There's really only one way to store characters in a ByteArray (ie. 
ByteString) and call it utf8 encoded.
As far as I can tell, Squeak seems to do the right thing :)
I believe Nicolas pushed for implementation in Pharo some time ago, not 
sure what happened to that.

Cheers,
Henry

_______________________________________________
Pharo-project mailing list
[email protected]
http://lists.gforge.inria.fr/cgi-bin/mailman/listinfo/pharo-project

Reply via email to