On Dec 5, 2013 10:50 PM, "Max Leske" <[email protected]> wrote: > > There are several different approaches in different places: > > - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. > - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information > - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii > - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii > - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream > > I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. > > The easiest solution I see would be to implement something like this: > > ReadStream>>next > ^ self isBinary > ifTrue: [ self basicNext asCharacter ] > ifFalse: [ self basicNext ] > > However, #next et al. are implemented in a plugin and the primitive method looks like this: > > ReadStream>>next > <primitive: 65> > position >= readLimit > ifTrue: [^nil] > ifFalse: [^collection at: (position := position + 1)] > > This means the collection instance variable has to hold either a binary or a string collection. > > I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): > > ReadStream>>binary > collection isString ifFalse: [ ^ self ]. > collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection > > ReadStrem>>ascii > collection isString ifTrue: [ ^ self ]. > collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection > > @Damien > opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that.
Thank you very much for your analysis. I'm a bit reluctant to change ReadStream now, but I could be ok with enough unit tests. Another potential solution: What about adding a #binaryReadStream method to the memory file system as a workaround before the introduction of xstreams ? > With this change in place the 12259 would become obsolete. > > Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. > > Cheers, > Max > > On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote: > >> Let me see what I can come up with. >> >> >> On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote: >> >>> Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content >>> >>> On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote: >>>> >>>> Damien, Marcus >>>> >>>> this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. >>>> >>>> Cheers, >>>> Max >> >> >
