On Dec 5, 2013 10:50 PM, "Max Leske" <[email protected]> wrote:
>
> There are several different approaches in different places:
>
> - FileStream reads strings by default. #binary and #ascii switch between
formats. File streams use an internal buffer which is either a String
(default) or a ByteArray. It’s even possible to switch between binary and
ascii midstream without losing information (if done right) because it only
affects the buffer.
> - ReadStream and WriteStream cannot change their format. Their behavior
is determined by the underlying collection. Forcing conversions (e.g. by
#asString) can lead to loss of information
> - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also
support the #binary #ascii method of switching format. Default is #ascii
> - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
> - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default
is #binary (implicit); depends on the underlying stream
>
> I think the pattern to follow is clear: ReadStream and WriteStream should
allow switching format with #ascii and #binary, default should be #ascii.
However, I suspect there’s a reason that these classes don’t support
switching, namely that switching makes the implementation more complicated
and also slower because more checks need to be made.
>
> The easiest solution I see would be to implement something like this:
>
> ReadStream>>next
> ^ self isBinary
> ifTrue: [ self basicNext asCharacter ]
> ifFalse: [ self basicNext ]
>
> However, #next et al. are implemented in a plugin and the primitive
method looks like this:
>
> ReadStream>>next
> <primitive: 65>
> position >= readLimit
> ifTrue: [^nil]
> ifFalse: [^collection at: (position := position + 1)]
>
> This means the collection instance variable has to hold either a binary
or a string collection.
>
> I’ve found a solution which would work and I’ve whipped up a working way
(there’s space for improvement…):
>
> ReadStream>>binary
> collection isString ifFalse: [ ^ self ].
> collection := (ByteArray new: collection size) copyReplaceFrom: 1 to:
collection size with: collection
>
> ReadStrem>>ascii
> collection isString ifTrue: [ ^ self ].
> collection := (String new: collection size) copyReplaceFrom: 1 to:
collection size with: collection
>
> @Damien
> opposed to what I wrote earlier, #asString does *not* destroy
non-printable characters. Instead, every byte (from 0 to 255) is encoded as
a character and thus the string can be converted back to a ByteArray
*without* loss of information. Sorry about that.

Thank you very much for your analysis. I'm a bit reluctant to change
ReadStream now, but I could be ok with enough unit tests.
Another potential solution: What about adding a #binaryReadStream method to
the memory file system as a workaround before the introduction of xstreams
?

> With this change in place the 12259 would become obsolete.
>
> Please let me know what you think. This is a pretty big change that might
have a lot of consequences in the image.
>
> Cheers,
> Max
>
> On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote:
>
>> Let me see what I can come up with.
>>
>>
>> On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote:
>>
>>> Thanks Max for the report. Do you have an idea on how we could solve
the problem ? The previous behaviour was not acceptable either because the
streams that came out of a memory filesystem were the only ones with binary
content
>>>
>>> On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote:
>>>>
>>>> Damien, Marcus
>>>>
>>>> this change breaks a lot of things in FileSystem-Git. I don’t disagree
with the idea that reading characters should be default (one could argue
about it…) but your change makes it IMPOSSIBLE to read bytes because
unprintable characters are discarded! So if my ByteArray is a NULL
terminated string, for instance, I can not check for the NULL termination
anymore.
>>>>
>>>> Cheers,
>>>> Max
>>
>>
>

Reply via email to