Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

Max Leske Thu, 05 Dec 2013 13:51:04 -0800

There are several different approaches in different places:

- FileStream reads strings by default. #binary and #ascii switch between 
formats. File streams use an internal buffer which is either a String (default) 
or a ByteArray. It’s even possible to switch between binary and ascii midstream 
without losing information (if done right) because it only affects the buffer.
- ReadStream and WriteStream cannot change their format. Their behavior is 
determined by the underlying collection. Forcing conversions (e.g. by 
#asString) can lead to loss of information
- RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support 
the #binary #ascii method of switching format. Default is #ascii
- SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
- ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is 
#binary (implicit); depends on the underlying stream

I think the pattern to follow is clear: ReadStream and WriteStream should allow 
switching format with #ascii and #binary, default should be #ascii. However, I 
suspect there’s a reason that these classes don’t support switching, namely 
that switching makes the implementation more complicated and also slower 
because more checks need to be made.

The easiest solution I see would be to implement something like this:

ReadStream>>next
        ^ self isBinary
                ifTrue: [ self basicNext asCharacter ]
                ifFalse: [ self basicNext ]

However, #next et al. are implemented in a plugin and the primitive method 
looks like this:

ReadStream>>next
        <primitive: 65> 
        position >= readLimit 
                ifTrue: [^nil] 
                ifFalse: [^collection at: (position := position + 1)]

This means the collection instance variable has to hold either a binary or a 
string collection.

I’ve found a solution which would work and I’ve whipped up a working way 
(there’s space for improvement…):

ReadStream>>binary
        collection isString ifFalse: [ ^ self ].
        collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: 
collection size with: collection

ReadStrem>>ascii
        collection isString ifTrue: [ ^ self ].
        collection := (String new: collection size) copyReplaceFrom: 1 to: 
collection size with: collection

@Damien
opposed to what I wrote earlier, #asString does *not* destroy non-printable 
characters. Instead, every byte (from 0 to 255) is encoded as a character and 
thus the string can be converted back to a ByteArray *without* loss of 
information. Sorry about that.

With this change in place the 12259 would become obsolete.

Please let me know what you think. This is a pretty big change that might have 
a lot of consequences in the image.

Cheers,
Max

On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote:

> Let me see what I can come up with.
> 
> 
> On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote:
> 
>> Thanks Max for the report. Do you have an idea on how we could solve the 
>> problem ? The previous behaviour was not acceptable either because the 
>> streams that came out of a memory filesystem were the only ones with binary 
>> content
>> 
>> On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote:
>> Damien, Marcus
>> 
>> this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
>> the idea that reading characters should be default (one could argue about 
>> it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
>> characters are discarded! So if my ByteArray is a NULL terminated string, 
>> for instance, I can not check for the NULL termination anymore.
>> 
>> Cheers,
>> Max
>

Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

Reply via email to