On 04 Jul 2011, at 20:44, Guillermo Polito wrote: > Actually, this testcase may be wrong, but reproduces the problem when doing a > fileout with tildes and ñ's in packages or authors... > > BTW, it does not fail on the assert, it raises an exception when sending the > #nextFromStream: :S
Well, this is a correct test case: | converter string | converter := UTF8TextConverter new. string := String streamContents: [ :stream | converter nextPut: $a toStream: stream ]. $a = (converter nextFromStream: string readStream). | converter string character | converter := UTF8TextConverter new. character := Character value: 241. "lowercase n with diacritical tilde, in HTML ñ" string := String streamContents: [ :stream | converter nextPut: character toStream: stream ]. character = (converter nextFromStream: string readStream). The silly/stupid thing with the TextConverter hierarchy is that it encodes characters onto a character stream that it treats as a binary stream (i.e. a byte with value 200 decimal is stored as a Character with value 200). To decode, its needs a character stream but treats it as if it contained bytes! If you replace the String with ByteArray in the above it completely fails to act as expected. Look into the code and you'll be surprised. On the other hand, the ZnCharacterEncoder hierachy acts as a real (and simpler) encoder/decoder from characters to bytes and vice versa: | converter bytes character | converter := ZnUTF8Encoder new. character := Character value: 241. "lowercase n with diacritical tilde, in HTML ñ" bytes := ByteArray streamContents: [ :stream | converter nextPut: character toStream: stream ]. character = (converter nextFromStream: bytes readStream). BTW, the main reason for introducing the ZnCharacterEncoder hierachy was because I needed a way to compute how many bytes of encoding a string needed before encoding it (see #encodedByteCountFor:), a non trivial operation for a variable length encoding like UTF8, but the messed up API was another one. Sven PS: I have *not* said that there is an encoding fault in UTF8TextConverter, just that the APi is freaky.
