On Nov 30, 2010, at 11:46 AM, Sven Van Caekenberghe wrote: > I understand very well that you are very busy, Stéphane, I am not asking for > a solution.
:) but I would love to have time :) > > It is exactly the same in Squeak. > > Like Philippe said, it looks wrong. so open a ticket > > On 29 Nov 2010, at 20:12, Stéphane Ducasse wrote: > >> sven >> >> I'm terribly and more than that busy until mid or more dec. >> Now did you check if the behavior is the same in squeak? >> S. >> >> On Nov 29, 2010, at 3:23 PM, Sven Van Caekenberghe wrote: >> >>> Hi, >>> >>> TextConverter and its subclasses seem to break the contract of >>> #nextFromStream: and #nextPut:toStream: when the stream #isBinary. Consider >>> the following two examples: >>> >>> ByteArray streamContents: [ :stream | | encoder | >>> encoder := UTF8TextConverter new. >>> 'élève en français' do: [ :each | encoder nextPut: each toStream: >>> stream ] ]. >>> >>> #[233 108 232 118 101 32 101 110 32 102 114 97 110 231 97 105 115] >>> >>> (String streamContents: [ :stream | | encoder | >>> encoder := UTF8TextConverter new. >>> 'élève en français' do: [ :each | encoder nextPut: each toStream: >>> stream ] ]) asByteArray. >>> >>> #[195 169 108 195 168 118 101 32 101 110 32 102 114 97 110 195 167 97 105 >>> 115] >>> >>> The first answer is incorrect, the second is correct (as far as I >>> understand it). >>> >>> This is apparently on purpose, from the implementation of, for example, >>> UTF8TextConverter>>#nextPut:toStream: >>> >>> nextPut: aCharacter toStream: aStream >>> | leadingChar nBytes mask shift ucs2code | >>> aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. >>> leadingChar := aCharacter leadingChar. >>> (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [ >>> aStream basicNextPut: aCharacter. >>> ^ aStream. >>> ]. >>> >>> "leadingChar > 3 ifTrue: [^ aStream]." >>> >>> ucs2code := aCharacter asUnicode. >>> ucs2code ifNil: [^ aStream]. >>> >>> nBytes := ucs2code highBit + 3 // 5. >>> mask := #(128 192 224 240 248 252 254 255) at: nBytes. >>> shift := nBytes - 1 * -6. >>> aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + >>> mask). >>> 2 to: nBytes do: [:i | >>> shift := shift + 6. >>> aStream basicNextPut: (Character value: ((ucs2code bitShift: >>> shift) bitAnd: 63) + 128). >>> ]. >>> >>> ^ aStream. >>> >>> I would say that the contract of #nextPut:toStream: is to take a Character >>> object and write a binary representation using a specific encoding to a >>> stream. However, when given a #isBinary stream, it does no longer do any >>> encoding at all ! >>> >>> The same is true for the other converters as well as for #nextFromStream:. >>> >>> Does anyone know why that is the case ? >>> >>> And if it is by design, how should one do UTF8 encoding on a binary stream >>> ?? >>> >>> Thx, >>> >>> Sven >>> >>> PS: If others also think this is strange, I could make an issue, I am just >>> not sure this is a bug. >>> >>> >>> >> >> > >
