So in favor of not modifying ReadStream I implemented your suggestion #binaryReadStream in Filesystem (will not work on all streams now but that’s ok for me until we get Xstreams). Please take a look and let me know what you think (ReadStream would have been 2 methods only vs. ~6 :) ).
Name: SLICE-Issue-12259-FileSystem-memory-readswrites-using-a-binary-stream-by-default-MaxLeske.1 Author: MaxLeske Time: 25 January 2014, 11:38:48.027093 pm UUID: 1248d0f9-f094-400f-a736-23fd41d12e50 Ancestors: Dependencies: FileSystem-Tests-Core-MaxLeske.67, FileSystem-Memory-MaxLeske.46, FileSystem-Disk-MaxLeske.72, FileSystem-Core-MaxLeske.139 * added #binaryReadStream and friends to FileSystem (for compatibility) Cheers, Max On 10.12.2013, at 22:19, Nicolas Cellier <[email protected]> wrote: > The more you add, the more you'll have to remove > > > 2013/12/10 Damien Cassou <[email protected]> > > On Dec 5, 2013 10:50 PM, "Max Leske" <[email protected]> wrote: > > > > There are several different approaches in different places: > > > > - FileStream reads strings by default. #binary and #ascii switch between > > formats. File streams use an internal buffer which is either a String > > (default) or a ByteArray. It’s even possible to switch between binary and > > ascii midstream without losing information (if done right) because it only > > affects the buffer. > > - ReadStream and WriteStream cannot change their format. Their behavior is > > determined by the underlying collection. Forcing conversions (e.g. by > > #asString) can lead to loss of information > > - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also > > support the #binary #ascii method of switching format. Default is #ascii > > - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii > > - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is > > #binary (implicit); depends on the underlying stream > > > > I think the pattern to follow is clear: ReadStream and WriteStream should > > allow switching format with #ascii and #binary, default should be #ascii. > > However, I suspect there’s a reason that these classes don’t support > > switching, namely that switching makes the implementation more complicated > > and also slower because more checks need to be made. > > > > The easiest solution I see would be to implement something like this: > > > > ReadStream>>next > > ^ self isBinary > > ifTrue: [ self basicNext asCharacter ] > > ifFalse: [ self basicNext ] > > > > However, #next et al. are implemented in a plugin and the primitive method > > looks like this: > > > > ReadStream>>next > > <primitive: 65> > > position >= readLimit > > ifTrue: [^nil] > > ifFalse: [^collection at: (position := position + 1)] > > > > This means the collection instance variable has to hold either a binary or > > a string collection. > > > > I’ve found a solution which would work and I’ve whipped up a working way > > (there’s space for improvement…): > > > > ReadStream>>binary > > collection isString ifFalse: [ ^ self ]. > > collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: > > collection size with: collection > > > > ReadStrem>>ascii > > collection isString ifTrue: [ ^ self ]. > > collection := (String new: collection size) copyReplaceFrom: 1 to: > > collection size with: collection > > > > @Damien > > opposed to what I wrote earlier, #asString does *not* destroy non-printable > > characters. Instead, every byte (from 0 to 255) is encoded as a character > > and thus the string can be converted back to a ByteArray *without* loss of > > information. Sorry about that. > Thank you very much for your analysis. I'm a bit reluctant to change > ReadStream now, but I could be ok with enough unit tests. > Another potential solution: What about adding a #binaryReadStream method to > the memory file system as a workaround before the introduction of xstreams ? > > > With this change in place the 12259 would become obsolete. > > > > Please let me know what you think. This is a pretty big change that might > > have a lot of consequences in the image. > > > > Cheers, > > Max > > > > On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote: > > > >> Let me see what I can come up with. > >> > >> > >> On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote: > >> > >>> Thanks Max for the report. Do you have an idea on how we could solve the > >>> problem ? The previous behaviour was not acceptable either because the > >>> streams that came out of a memory filesystem were the only ones with > >>> binary content > >>> > >>> On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote: > >>>> > >>>> Damien, Marcus > >>>> > >>>> this change breaks a lot of things in FileSystem-Git. I don’t disagree > >>>> with the idea that reading characters should be default (one could argue > >>>> about it…) but your change makes it IMPOSSIBLE to read bytes because > >>>> unprintable characters are discarded! So if my ByteArray is a NULL > >>>> terminated string, for instance, I can not check for the NULL > >>>> termination anymore. > >>>> > >>>> Cheers, > >>>> Max > >> > >> > > >
