I understand very well that you are very busy, Stéphane, I am not asking for a solution.
It is exactly the same in Squeak. Like Philippe said, it looks wrong. On 29 Nov 2010, at 20:12, Stéphane Ducasse wrote: > sven > > I'm terribly and more than that busy until mid or more dec. > Now did you check if the behavior is the same in squeak? > S. > > On Nov 29, 2010, at 3:23 PM, Sven Van Caekenberghe wrote: > >> Hi, >> >> TextConverter and its subclasses seem to break the contract of >> #nextFromStream: and #nextPut:toStream: when the stream #isBinary. Consider >> the following two examples: >> >> ByteArray streamContents: [ :stream | | encoder | >> encoder := UTF8TextConverter new. >> 'élève en français' do: [ :each | encoder nextPut: each toStream: >> stream ] ]. >> >> #[233 108 232 118 101 32 101 110 32 102 114 97 110 231 97 105 115] >> >> (String streamContents: [ :stream | | encoder | >> encoder := UTF8TextConverter new. >> 'élève en français' do: [ :each | encoder nextPut: each toStream: >> stream ] ]) asByteArray. >> >> #[195 169 108 195 168 118 101 32 101 110 32 102 114 97 110 195 167 97 105 >> 115] >> >> The first answer is incorrect, the second is correct (as far as I understand >> it). >> >> This is apparently on purpose, from the implementation of, for example, >> UTF8TextConverter>>#nextPut:toStream: >> >> nextPut: aCharacter toStream: aStream >> | leadingChar nBytes mask shift ucs2code | >> aStream isBinary ifTrue: [^aCharacter storeBinaryOn: aStream]. >> leadingChar := aCharacter leadingChar. >> (leadingChar = 0 and: [aCharacter asciiValue < 128]) ifTrue: [ >> aStream basicNextPut: aCharacter. >> ^ aStream. >> ]. >> >> "leadingChar > 3 ifTrue: [^ aStream]." >> >> ucs2code := aCharacter asUnicode. >> ucs2code ifNil: [^ aStream]. >> >> nBytes := ucs2code highBit + 3 // 5. >> mask := #(128 192 224 240 248 252 254 255) at: nBytes. >> shift := nBytes - 1 * -6. >> aStream basicNextPut: (Character value: (ucs2code bitShift: shift) + >> mask). >> 2 to: nBytes do: [:i | >> shift := shift + 6. >> aStream basicNextPut: (Character value: ((ucs2code bitShift: >> shift) bitAnd: 63) + 128). >> ]. >> >> ^ aStream. >> >> I would say that the contract of #nextPut:toStream: is to take a Character >> object and write a binary representation using a specific encoding to a >> stream. However, when given a #isBinary stream, it does no longer do any >> encoding at all ! >> >> The same is true for the other converters as well as for #nextFromStream:. >> >> Does anyone know why that is the case ? >> >> And if it is by design, how should one do UTF8 encoding on a binary stream ?? >> >> Thx, >> >> Sven >> >> PS: If others also think this is strange, I could make an issue, I am just >> not sure this is a bug. >> >> >> > >
