Hi,

TextConverter and its subclasses seem to break the contract of #nextFromStream: 
and #nextPut:toStream: when the stream #isBinary. Consider the following two 
examples:

ByteArray streamContents: [ :stream | | encoder |
        encoder := UTF8TextConverter new.
        'élève en français' do: [ :each | encoder nextPut: each toStream: 
stream ] ].

 #[233 108 232 118 101 32 101 110 32 102 114 97 110 231 97 105 115]

(String streamContents: [ :stream | | encoder |
        encoder := UTF8TextConverter new.
        'élève en français' do: [ :each | encoder nextPut: each toStream: 
stream ] ]) asByteArray.

 #[195 169 108 195 168 118 101 32 101 110 32 102 114 97 110 195 167 97 105 115]

The first answer is incorrect, the second is correct (as far as I understand 
it).

This is apparently on purpose, from the implementation of, for example, 
UTF8TextConverter>>#nextPut:toStream:

nextPut: aCharacter toStream: aStream 
        | leadingChar nBytes mask shift ucs2code |
        aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].
        leadingChar := aCharacter leadingChar.
        (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [
                aStream basicNextPut: aCharacter.
                ^ aStream.
        ].

        "leadingChar > 3 ifTrue: [^ aStream]."

        ucs2code := aCharacter asUnicode.
        ucs2code ifNil: [^ aStream].

        nBytes := ucs2code highBit + 3 // 5.
        mask := #(128 192 224 240 248 252 254 255) at: nBytes.
        shift := nBytes - 1 * -6.
        aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + 
mask).
        2 to: nBytes do: [:i | 
                shift := shift + 6.
                aStream basicNextPut: (Character value: ((ucs2code bitShift: 
shift) bitAnd: 63) + 128).
        ].

        ^ aStream.

I would say that the contract of #nextPut:toStream: is to take a Character 
object and write a binary representation using a specific encoding to a stream. 
However, when given a #isBinary stream, it does no longer do any encoding at 
all !

The same is true for the other converters as well as for #nextFromStream:. 

Does anyone know why that is the case ?

And if it is by design, how should one do UTF8 encoding on a binary stream ??

Thx,

Sven

PS: If others also think this is strange, I could make an issue, I am just not 
sure this is a bug.



Reply via email to