Hem, switching #ascii <-> #binary does only make sense in... ASCII With every other encoding, it's not something that makes sense at all, or maybe #latin1 <-> #binary, #utf8 <-> #binary, #utf16 <-> #binary
2013/12/5 Max Leske <[email protected]> > There are several different approaches in different places: > > - FileStream reads strings by default. #binary and #ascii switch between > formats. File streams use an internal buffer which is either a String > (default) or a ByteArray. It’s even possible to switch between binary and > ascii midstream without losing information (if done right) because it only > affects the buffer. > - ReadStream and WriteStream cannot change their format. Their behavior is > determined by the underlying collection. Forcing conversions (e.g. by > #asString) can lead to loss of information > - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also > support the #binary #ascii method of switching format. Default is #ascii > - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii > - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is > #binary (implicit); depends on the underlying stream > > I think the pattern to follow is clear: ReadStream and WriteStream should > allow switching format with #ascii and #binary, default should be #ascii. > However, I suspect there’s a reason that these classes don’t support > switching, namely that switching makes the implementation more complicated > and also slower because more checks need to be made. > > The easiest solution I see would be to implement something like this: > > ReadStream>>next > ^ self isBinary > ifTrue: [ self basicNext asCharacter ] > ifFalse: [ self basicNext ] > > However, #next et al. are implemented in a plugin and the primitive method > looks like this: > > ReadStream>>next > <primitive: 65> > position >= readLimit > ifTrue: [^nil] > ifFalse: [^collection at: (position := position + 1)] > > This means the collection instance variable has to hold either a binary or > a string collection. > > I’ve found a solution which would work and I’ve whipped up a working way > (there’s space for improvement…): > > ReadStream>>binary > collection isString ifFalse: [ ^ self ]. > collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: > collection size with: collection > > ReadStrem>>ascii > collection isString ifTrue: [ ^ self ]. > collection := (String new: collection size) copyReplaceFrom: 1 to: > collection size with: collection > > @Damien > opposed to what I wrote earlier, #asString does *not* destroy > non-printable characters. Instead, every byte (from 0 to 255) is encoded as > a character and thus the string can be converted back to a ByteArray > *without* loss of information. Sorry about that. > > With this change in place the 12259 would become obsolete. > > Please let me know what you think. This is a pretty big change that might > have a lot of consequences in the image. > > Cheers, > Max > > On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote: > > Let me see what I can come up with. > > > On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote: > > Thanks Max for the report. Do you have an idea on how we could solve the > problem ? The previous behaviour was not acceptable either because the > streams that came out of a memory filesystem were the only ones with binary > content > On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote: > >> Damien, Marcus >> >> this change breaks a lot of things in FileSystem-Git. I don’t disagree >> with the idea that reading characters should be default (one could argue >> about it…) but your change makes it IMPOSSIBLE to read bytes because >> unprintable characters are discarded! So if my ByteArray is a NULL >> terminated string, for instance, I can not check for the NULL termination >> anymore. >> >> Cheers, >> Max >> > > >
