On Nov 30, 2010, at 11:46 AM, Sven Van Caekenberghe wrote:

> I understand very well that you are very busy, Stéphane, I am not asking for 
> a solution.

:)
but I would love to have time :)

> 
> It is exactly the same in Squeak.
> 
> Like Philippe said, it looks wrong.

so open a ticket
> 
> On 29 Nov 2010, at 20:12, Stéphane Ducasse wrote:
> 
>> sven 
>> 
>> I'm terribly and more than that busy until mid or more dec.
>> Now did you check if the behavior is the same in squeak?
>> S.
>> 
>> On Nov 29, 2010, at 3:23 PM, Sven Van Caekenberghe wrote:
>> 
>>> Hi,
>>> 
>>> TextConverter and its subclasses seem to break the contract of 
>>> #nextFromStream: and #nextPut:toStream: when the stream #isBinary. Consider 
>>> the following two examples:
>>> 
>>> ByteArray streamContents: [ :stream | | encoder |
>>>     encoder := UTF8TextConverter new.
>>>     'élève en français' do: [ :each | encoder nextPut: each toStream: 
>>> stream ] ].
>>> 
>>> #[233 108 232 118 101 32 101 110 32 102 114 97 110 231 97 105 115]
>>> 
>>> (String streamContents: [ :stream | | encoder |
>>>     encoder := UTF8TextConverter new.
>>>     'élève en français' do: [ :each | encoder nextPut: each toStream: 
>>> stream ] ]) asByteArray.
>>> 
>>> #[195 169 108 195 168 118 101 32 101 110 32 102 114 97 110 195 167 97 105 
>>> 115]
>>> 
>>> The first answer is incorrect, the second is correct (as far as I 
>>> understand it).
>>> 
>>> This is apparently on purpose, from the implementation of, for example, 
>>> UTF8TextConverter>>#nextPut:toStream:
>>> 
>>> nextPut: aCharacter toStream: aStream 
>>>     | leadingChar nBytes mask shift ucs2code |
>>>     aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream].
>>>     leadingChar := aCharacter leadingChar.
>>>     (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [
>>>             aStream basicNextPut: aCharacter.
>>>             ^ aStream.
>>>     ].
>>> 
>>>     "leadingChar > 3 ifTrue: [^ aStream]."
>>> 
>>>     ucs2code := aCharacter asUnicode.
>>>     ucs2code ifNil: [^ aStream].
>>> 
>>>     nBytes := ucs2code highBit + 3 // 5.
>>>     mask := #(128 192 224 240 248 252 254 255) at: nBytes.
>>>     shift := nBytes - 1 * -6.
>>>     aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + 
>>> mask).
>>>     2 to: nBytes do: [:i | 
>>>             shift := shift + 6.
>>>             aStream basicNextPut: (Character value: ((ucs2code bitShift: 
>>> shift) bitAnd: 63) + 128).
>>>     ].
>>> 
>>>     ^ aStream.
>>> 
>>> I would say that the contract of #nextPut:toStream: is to take a Character 
>>> object and write a binary representation using a specific encoding to a 
>>> stream. However, when given a #isBinary stream, it does no longer do any 
>>> encoding at all !
>>> 
>>> The same is true for the other converters as well as for #nextFromStream:. 
>>> 
>>> Does anyone know why that is the case ?
>>> 
>>> And if it is by design, how should one do UTF8 encoding on a binary stream 
>>> ??
>>> 
>>> Thx,
>>> 
>>> Sven
>>> 
>>> PS: If others also think this is strange, I could make an issue, I am just 
>>> not sure this is a bug.
>>> 
>>> 
>>> 
>> 
>> 
> 
> 


Reply via email to