I understand very well that you are very busy, Stéphane, I am not asking for a 
solution.

It is exactly the same in Squeak.

Like Philippe said, it looks wrong.

On 29 Nov 2010, at 20:12, Stéphane Ducasse wrote:

> sven 
> 
> I'm terribly and more than that busy until mid or more dec.
> Now did you check if the behavior is the same in squeak?
> S.
> 
> On Nov 29, 2010, at 3:23 PM, Sven Van Caekenberghe wrote:
> 
>> Hi,
>> 
>> TextConverter and its subclasses seem to break the contract of 
>> #nextFromStream: and #nextPut:toStream: when the stream #isBinary. Consider 
>> the following two examples:
>> 
>> ByteArray streamContents: [ :stream | | encoder |
>>      encoder := UTF8TextConverter new.
>>      'élève en français' do: [ :each | encoder nextPut: each toStream: 
>> stream ] ].
>> 
>> #[233 108 232 118 101 32 101 110 32 102 114 97 110 231 97 105 115]
>> 
>> (String streamContents: [ :stream | | encoder |
>>      encoder := UTF8TextConverter new.
>>      'élève en français' do: [ :each | encoder nextPut: each toStream: 
>> stream ] ]) asByteArray.
>> 
>> #[195 169 108 195 168 118 101 32 101 110 32 102 114 97 110 195 167 97 105 
>> 115]
>> 
>> The first answer is incorrect, the second is correct (as far as I understand 
>> it).
>> 
>> This is apparently on purpose, from the implementation of, for example, 
>> UTF8TextConverter>>#nextPut:toStream:
>> 
>> nextPut: aCharacter toStream: aStream 
>>      | leadingChar nBytes mask shift ucs2code |
>>      aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].
>>      leadingChar := aCharacter leadingChar.
>>      (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [
>>              aStream basicNextPut: aCharacter.
>>              ^ aStream.
>>      ].
>> 
>>      "leadingChar > 3 ifTrue: [^ aStream]."
>> 
>>      ucs2code := aCharacter asUnicode.
>>      ucs2code ifNil: [^ aStream].
>> 
>>      nBytes := ucs2code highBit + 3 // 5.
>>      mask := #(128 192 224 240 248 252 254 255) at: nBytes.
>>      shift := nBytes - 1 * -6.
>>      aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + 
>> mask).
>>      2 to: nBytes do: [:i | 
>>              shift := shift + 6.
>>              aStream basicNextPut: (Character value: ((ucs2code bitShift: 
>> shift) bitAnd: 63) + 128).
>>      ].
>> 
>>      ^ aStream.
>> 
>> I would say that the contract of #nextPut:toStream: is to take a Character 
>> object and write a binary representation using a specific encoding to a 
>> stream. However, when given a #isBinary stream, it does no longer do any 
>> encoding at all !
>> 
>> The same is true for the other converters as well as for #nextFromStream:. 
>> 
>> Does anyone know why that is the case ?
>> 
>> And if it is by design, how should one do UTF8 encoding on a binary stream ??
>> 
>> Thx,
>> 
>> Sven
>> 
>> PS: If others also think this is strange, I could make an issue, I am just 
>> not sure this is a bug.
>> 
>> 
>> 
> 
> 


Reply via email to