Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

Max Leske Fri, 06 Dec 2013 22:08:46 -0800

I agree but that is a problem inherent to the current implementation and it’s 
not really my goal now to fix all the shortcomings :) I simply want a 
consistent way to get through this (since I’ve heard that the streams might be 
replaced with Xtreams…).



On 07.12.2013, at 00:44, Nicolas Cellier <[email protected]> 
wrote:

> Hem, switching #ascii <-> #binary does only make sense in... ASCII
> With every other encoding, it's not something that makes sense at all, or 
> maybe #latin1 <-> #binary, #utf8 <-> #binary, #utf16 <-> #binary
> 
> 
> 2013/12/5 Max Leske <[email protected]>
> There are several different approaches in different places:
> 
> - FileStream reads strings by default. #binary and #ascii switch between 
> formats. File streams use an internal buffer which is either a String 
> (default) or a ByteArray. It’s even possible to switch between binary and 
> ascii midstream without losing information (if done right) because it only 
> affects the buffer.
> - ReadStream and WriteStream cannot change their format. Their behavior is 
> determined by the underlying collection. Forcing conversions (e.g. by 
> #asString) can lead to loss of information
> - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support 
> the #binary #ascii method of switching format. Default is #ascii
> - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
> - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is 
> #binary (implicit); depends on the underlying stream
> 
> I think the pattern to follow is clear: ReadStream and WriteStream should 
> allow switching format with #ascii and #binary, default should be #ascii. 
> However, I suspect there’s a reason that these classes don’t support 
> switching, namely that switching makes the implementation more complicated 
> and also slower because more checks need to be made.
> 
> The easiest solution I see would be to implement something like this:
> 
> ReadStream>>next
>       ^ self isBinary
>               ifTrue: [ self basicNext asCharacter ]
>               ifFalse: [ self basicNext ]
> 
> However, #next et al. are implemented in a plugin and the primitive method 
> looks like this:
> 
> ReadStream>>next
>       <primitive: 65> 
>       position >= readLimit 
>               ifTrue: [^nil] 
>               ifFalse: [^collection at: (position := position + 1)]
> 
> This means the collection instance variable has to hold either a binary or a 
> string collection.
> 
> I’ve found a solution which would work and I’ve whipped up a working way 
> (there’s space for improvement…):
> 
> ReadStream>>binary
>       collection isString ifFalse: [ ^ self ].
>       collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: 
> collection size with: collection
> 
> ReadStrem>>ascii
>       collection isString ifTrue: [ ^ self ].
>       collection := (String new: collection size) copyReplaceFrom: 1 to: 
> collection size with: collection
> 
> @Damien
> opposed to what I wrote earlier, #asString does *not* destroy non-printable 
> characters. Instead, every byte (from 0 to 255) is encoded as a character and 
> thus the string can be converted back to a ByteArray *without* loss of 
> information. Sorry about that.
> 
> With this change in place the 12259 would become obsolete.
> 
> Please let me know what you think. This is a pretty big change that might 
> have a lot of consequences in the image.
> 
> Cheers,
> Max
> 
> On 04.12.2013, at 13:14, Max Leske <[email protected]> wrote:
> 
>> Let me see what I can come up with.
>> 
>> 
>> On 03.12.2013, at 19:36, Damien Cassou <[email protected]> wrote:
>> 
>>> Thanks Max for the report. Do you have an idea on how we could solve the 
>>> problem ? The previous behaviour was not acceptable either because the 
>>> streams that came out of a memory filesystem were the only ones with binary 
>>> content
>>> 
>>> On Dec 3, 2013 5:35 PM, "Max Leske" <[email protected]> wrote:
>>> Damien, Marcus
>>> 
>>> this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
>>> the idea that reading characters should be default (one could argue about 
>>> it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
>>> characters are discarded! So if my ByteArray is a NULL terminated string, 
>>> for instance, I can not check for the NULL termination anymore.
>>> 
>>> Cheers,
>>> Max
>> 
> 
>

Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

Reply via email to