Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2014-01-25 Thread Max Leske
So in favor of not modifying ReadStream I implemented your suggestion 
#binaryReadStream in Filesystem (will not work on all streams now but that’s ok 
for me until we get Xstreams).
Please take a look and let me know what you think (ReadStream would have been 2 
methods only vs. ~6 :) ).

Name: 
SLICE-Issue-12259-FileSystem-memory-readswrites-using-a-binary-stream-by-default-MaxLeske.1
Author: MaxLeske
Time: 25 January 2014, 11:38:48.027093 pm
UUID: 1248d0f9-f094-400f-a736-23fd41d12e50
Ancestors: 
Dependencies: FileSystem-Tests-Core-MaxLeske.67, FileSystem-Memory-MaxLeske.46, 
FileSystem-Disk-MaxLeske.72, FileSystem-Core-MaxLeske.139

* added #binaryReadStream and friends to FileSystem (for compatibility)

Cheers,
Max


On 10.12.2013, at 22:19, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com 
wrote:

 The more you add, the more you'll have to remove
 
 
 2013/12/10 Damien Cassou damien.cas...@gmail.com
 
 On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote:
 
  There are several different approaches in different places:
 
  - FileStream reads strings by default. #binary and #ascii switch between 
  formats. File streams use an internal buffer which is either a String 
  (default) or a ByteArray. It’s even possible to switch between binary and 
  ascii midstream without losing information (if done right) because it only 
  affects the buffer.
  - ReadStream and WriteStream cannot change their format. Their behavior is 
  determined by the underlying collection. Forcing conversions (e.g. by 
  #asString) can lead to loss of information
  - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also 
  support the #binary #ascii method of switching format. Default is #ascii
  - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
  - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is 
  #binary (implicit); depends on the underlying stream
 
  I think the pattern to follow is clear: ReadStream and WriteStream should 
  allow switching format with #ascii and #binary, default should be #ascii. 
  However, I suspect there’s a reason that these classes don’t support 
  switching, namely that switching makes the implementation more complicated 
  and also slower because more checks need to be made.
 
  The easiest solution I see would be to implement something like this:
 
  ReadStreamnext
  ^ self isBinary
  ifTrue: [ self basicNext asCharacter ]
  ifFalse: [ self basicNext ]
 
  However, #next et al. are implemented in a plugin and the primitive method 
  looks like this:
 
  ReadStreamnext
  primitive: 65 
  position = readLimit 
  ifTrue: [^nil] 
  ifFalse: [^collection at: (position := position + 1)]
 
  This means the collection instance variable has to hold either a binary or 
  a string collection.
 
  I’ve found a solution which would work and I’ve whipped up a working way 
  (there’s space for improvement…):
 
  ReadStreambinary
  collection isString ifFalse: [ ^ self ].
  collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: 
  collection size with: collection
 
  ReadStremascii
  collection isString ifTrue: [ ^ self ].
  collection := (String new: collection size) copyReplaceFrom: 1 to: 
  collection size with: collection
 
  @Damien
  opposed to what I wrote earlier, #asString does *not* destroy non-printable 
  characters. Instead, every byte (from 0 to 255) is encoded as a character 
  and thus the string can be converted back to a ByteArray *without* loss of 
  information. Sorry about that.
 Thank you very much for your analysis. I'm a bit reluctant to change 
 ReadStream now, but I could be ok with enough unit tests. 
 Another potential solution: What about adding a #binaryReadStream method to 
 the memory file system as a workaround before the introduction of xstreams ? 
 
  With this change in place the 12259 would become obsolete.
 
  Please let me know what you think. This is a pretty big change that might 
  have a lot of consequences in the image.
 
  Cheers,
  Max
 
  On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:
 
  Let me see what I can come up with.
 
 
  On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
  Thanks Max for the report. Do you have an idea on how we could solve the 
  problem ? The previous behaviour was not acceptable either because the 
  streams that came out of a memory filesystem were the only ones with 
  binary content
 
  On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 
  Damien, Marcus
 
  this change breaks a lot of things in FileSystem-Git. I don’t disagree 
  with the idea that reading characters should be default (one could argue 
  about it…) but your change makes it IMPOSSIBLE to read bytes because 
  unprintable characters are discarded! So if my ByteArray is a NULL 
  terminated string, for instance, I can not check for the NULL 
  termination anymore.
 
  Cheers,
  Max
 
 
 
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-10 Thread Damien Cassou
On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote:

 There are several different approaches in different places:

 - FileStream reads strings by default. #binary and #ascii switch between
formats. File streams use an internal buffer which is either a String
(default) or a ByteArray. It’s even possible to switch between binary and
ascii midstream without losing information (if done right) because it only
affects the buffer.
 - ReadStream and WriteStream cannot change their format. Their behavior
is determined by the underlying collection. Forcing conversions (e.g. by
#asString) can lead to loss of information
 - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also
support the #binary #ascii method of switching format. Default is #ascii
 - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
 - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default
is #binary (implicit); depends on the underlying stream

 I think the pattern to follow is clear: ReadStream and WriteStream should
allow switching format with #ascii and #binary, default should be #ascii.
However, I suspect there’s a reason that these classes don’t support
switching, namely that switching makes the implementation more complicated
and also slower because more checks need to be made.

 The easiest solution I see would be to implement something like this:

 ReadStreamnext
 ^ self isBinary
 ifTrue: [ self basicNext asCharacter ]
 ifFalse: [ self basicNext ]

 However, #next et al. are implemented in a plugin and the primitive
method looks like this:

 ReadStreamnext
 primitive: 65
 position = readLimit
 ifTrue: [^nil]
 ifFalse: [^collection at: (position := position + 1)]

 This means the collection instance variable has to hold either a binary
or a string collection.

 I’ve found a solution which would work and I’ve whipped up a working way
(there’s space for improvement…):

 ReadStreambinary
 collection isString ifFalse: [ ^ self ].
 collection := (ByteArray new: collection size) copyReplaceFrom: 1 to:
collection size with: collection

 ReadStremascii
 collection isString ifTrue: [ ^ self ].
 collection := (String new: collection size) copyReplaceFrom: 1 to:
collection size with: collection

 @Damien
 opposed to what I wrote earlier, #asString does *not* destroy
non-printable characters. Instead, every byte (from 0 to 255) is encoded as
a character and thus the string can be converted back to a ByteArray
*without* loss of information. Sorry about that.

Thank you very much for your analysis. I'm a bit reluctant to change
ReadStream now, but I could be ok with enough unit tests.
Another potential solution: What about adding a #binaryReadStream method to
the memory file system as a workaround before the introduction of xstreams
?

 With this change in place the 12259 would become obsolete.

 Please let me know what you think. This is a pretty big change that might
have a lot of consequences in the image.

 Cheers,
 Max

 On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:

 Let me see what I can come up with.


 On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:

 Thanks Max for the report. Do you have an idea on how we could solve
the problem ? The previous behaviour was not acceptable either because the
streams that came out of a memory filesystem were the only ones with binary
content

 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:

 Damien, Marcus

 this change breaks a lot of things in FileSystem-Git. I don’t disagree
with the idea that reading characters should be default (one could argue
about it…) but your change makes it IMPOSSIBLE to read bytes because
unprintable characters are discarded! So if my ByteArray is a NULL
terminated string, for instance, I can not check for the NULL termination
anymore.

 Cheers,
 Max





Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-10 Thread Nicolas Cellier
The more you add, the more you'll have to remove


2013/12/10 Damien Cassou damien.cas...@gmail.com


 On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote:
 
  There are several different approaches in different places:
 
  - FileStream reads strings by default. #binary and #ascii switch between
 formats. File streams use an internal buffer which is either a String
 (default) or a ByteArray. It’s even possible to switch between binary and
 ascii midstream without losing information (if done right) because it only
 affects the buffer.
  - ReadStream and WriteStream cannot change their format. Their behavior
 is determined by the underlying collection. Forcing conversions (e.g. by
 #asString) can lead to loss of information
  - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also
 support the #binary #ascii method of switching format. Default is #ascii
  - SocketStream uses the same #binary / #ascii mechanism. Default is
 #ascii
  - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default
 is #binary (implicit); depends on the underlying stream
 
  I think the pattern to follow is clear: ReadStream and WriteStream
 should allow switching format with #ascii and #binary, default should be
 #ascii. However, I suspect there’s a reason that these classes don’t
 support switching, namely that switching makes the implementation more
 complicated and also slower because more checks need to be made.
 
  The easiest solution I see would be to implement something like this:
 
  ReadStreamnext
  ^ self isBinary
  ifTrue: [ self basicNext asCharacter ]
  ifFalse: [ self basicNext ]
 
  However, #next et al. are implemented in a plugin and the primitive
 method looks like this:
 
  ReadStreamnext
  primitive: 65
  position = readLimit
  ifTrue: [^nil]
  ifFalse: [^collection at: (position := position + 1)]
 
  This means the collection instance variable has to hold either a binary
 or a string collection.
 
  I’ve found a solution which would work and I’ve whipped up a working way
 (there’s space for improvement…):
 
  ReadStreambinary
  collection isString ifFalse: [ ^ self ].
  collection := (ByteArray new: collection size) copyReplaceFrom: 1 to:
 collection size with: collection
 
  ReadStremascii
  collection isString ifTrue: [ ^ self ].
  collection := (String new: collection size) copyReplaceFrom: 1 to:
 collection size with: collection
 
  @Damien
  opposed to what I wrote earlier, #asString does *not* destroy
 non-printable characters. Instead, every byte (from 0 to 255) is encoded as
 a character and thus the string can be converted back to a ByteArray
 *without* loss of information. Sorry about that.

 Thank you very much for your analysis. I'm a bit reluctant to change
 ReadStream now, but I could be ok with enough unit tests.
 Another potential solution: What about adding a #binaryReadStream method
 to the memory file system as a workaround before the introduction of
 xstreams ?

  With this change in place the 12259 would become obsolete.
 
  Please let me know what you think. This is a pretty big change that
 might have a lot of consequences in the image.
 
  Cheers,
  Max
 
  On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:
 
  Let me see what I can come up with.
 
 
  On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
  Thanks Max for the report. Do you have an idea on how we could solve
 the problem ? The previous behaviour was not acceptable either because the
 streams that came out of a memory filesystem were the only ones with binary
 content
 
  On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 
  Damien, Marcus
 
  this change breaks a lot of things in FileSystem-Git. I don’t
 disagree with the idea that reading characters should be default (one could
 argue about it…) but your change makes it IMPOSSIBLE to read bytes because
 unprintable characters are discarded! So if my ByteArray is a NULL
 terminated string, for instance, I can not check for the NULL termination
 anymore.
 
  Cheers,
  Max
 
 
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-06 Thread Nicolas Cellier
Hem, switching #ascii - #binary does only make sense in... ASCII
With every other encoding, it's not something that makes sense at all, or
maybe #latin1 - #binary, #utf8 - #binary, #utf16 - #binary


2013/12/5 Max Leske maxle...@gmail.com

 There are several different approaches in different places:

 - FileStream reads strings by default. #binary and #ascii switch between
 formats. File streams use an internal buffer which is either a String
 (default) or a ByteArray. It’s even possible to switch between binary and
 ascii midstream without losing information (if done right) because it only
 affects the buffer.
 - ReadStream and WriteStream cannot change their format. Their behavior is
 determined by the underlying collection. Forcing conversions (e.g. by
 #asString) can lead to loss of information
 - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also
 support the #binary #ascii method of switching format. Default is #ascii
 - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
 - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is
 #binary (implicit); depends on the underlying stream

 I think the pattern to follow is clear: ReadStream and WriteStream should
 allow switching format with #ascii and #binary, default should be #ascii.
 However, I suspect there’s a reason that these classes don’t support
 switching, namely that switching makes the implementation more complicated
 and also slower because more checks need to be made.

 The easiest solution I see would be to implement something like this:

 ReadStreamnext
 ^ self isBinary
 ifTrue: [ self basicNext asCharacter ]
 ifFalse: [ self basicNext ]

 However, #next et al. are implemented in a plugin and the primitive method
 looks like this:

 ReadStreamnext
 primitive: 65
 position = readLimit
 ifTrue: [^nil]
 ifFalse: [^collection at: (position := position + 1)]

 This means the collection instance variable has to hold either a binary or
 a string collection.

 I’ve found a solution which would work and I’ve whipped up a working way
 (there’s space for improvement…):

 ReadStreambinary
 collection isString ifFalse: [ ^ self ].
 collection := (ByteArray new: collection size) copyReplaceFrom: 1 to:
 collection size with: collection

 ReadStremascii
 collection isString ifTrue: [ ^ self ].
 collection := (String new: collection size) copyReplaceFrom: 1 to:
 collection size with: collection

 @Damien
 opposed to what I wrote earlier, #asString does *not* destroy
 non-printable characters. Instead, every byte (from 0 to 255) is encoded as
 a character and thus the string can be converted back to a ByteArray
 *without* loss of information. Sorry about that.

 With this change in place the 12259 would become obsolete.

 Please let me know what you think. This is a pretty big change that might
 have a lot of consequences in the image.

 Cheers,
 Max

 On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:

 Let me see what I can come up with.


 On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:

 Thanks Max for the report. Do you have an idea on how we could solve the
 problem ? The previous behaviour was not acceptable either because the
 streams that came out of a memory filesystem were the only ones with binary
 content
 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:

 Damien, Marcus

 this change breaks a lot of things in FileSystem-Git. I don’t disagree
 with the idea that reading characters should be default (one could argue
 about it…) but your change makes it IMPOSSIBLE to read bytes because
 unprintable characters are discarded! So if my ByteArray is a NULL
 terminated string, for instance, I can not check for the NULL termination
 anymore.

 Cheers,
 Max






Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-06 Thread Max Leske
I agree but that is a problem inherent to the current implementation and it’s 
not really my goal now to fix all the shortcomings :) I simply want a 
consistent way to get through this (since I’ve heard that the streams might be 
replaced with Xtreams…).


On 07.12.2013, at 00:44, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com 
wrote:

 Hem, switching #ascii - #binary does only make sense in... ASCII
 With every other encoding, it's not something that makes sense at all, or 
 maybe #latin1 - #binary, #utf8 - #binary, #utf16 - #binary
 
 
 2013/12/5 Max Leske maxle...@gmail.com
 There are several different approaches in different places:
 
 - FileStream reads strings by default. #binary and #ascii switch between 
 formats. File streams use an internal buffer which is either a String 
 (default) or a ByteArray. It’s even possible to switch between binary and 
 ascii midstream without losing information (if done right) because it only 
 affects the buffer.
 - ReadStream and WriteStream cannot change their format. Their behavior is 
 determined by the underlying collection. Forcing conversions (e.g. by 
 #asString) can lead to loss of information
 - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support 
 the #binary #ascii method of switching format. Default is #ascii
 - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
 - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is 
 #binary (implicit); depends on the underlying stream
 
 I think the pattern to follow is clear: ReadStream and WriteStream should 
 allow switching format with #ascii and #binary, default should be #ascii. 
 However, I suspect there’s a reason that these classes don’t support 
 switching, namely that switching makes the implementation more complicated 
 and also slower because more checks need to be made.
 
 The easiest solution I see would be to implement something like this:
 
 ReadStreamnext
   ^ self isBinary
   ifTrue: [ self basicNext asCharacter ]
   ifFalse: [ self basicNext ]
 
 However, #next et al. are implemented in a plugin and the primitive method 
 looks like this:
 
 ReadStreamnext
   primitive: 65 
   position = readLimit 
   ifTrue: [^nil] 
   ifFalse: [^collection at: (position := position + 1)]
 
 This means the collection instance variable has to hold either a binary or a 
 string collection.
 
 I’ve found a solution which would work and I’ve whipped up a working way 
 (there’s space for improvement…):
 
 ReadStreambinary
   collection isString ifFalse: [ ^ self ].
   collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: 
 collection size with: collection
 
 ReadStremascii
   collection isString ifTrue: [ ^ self ].
   collection := (String new: collection size) copyReplaceFrom: 1 to: 
 collection size with: collection
 
 @Damien
 opposed to what I wrote earlier, #asString does *not* destroy non-printable 
 characters. Instead, every byte (from 0 to 255) is encoded as a character and 
 thus the string can be converted back to a ByteArray *without* loss of 
 information. Sorry about that.
 
 With this change in place the 12259 would become obsolete.
 
 Please let me know what you think. This is a pretty big change that might 
 have a lot of consequences in the image.
 
 Cheers,
 Max
 
 On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:
 
 Let me see what I can come up with.
 
 
 On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
 Thanks Max for the report. Do you have an idea on how we could solve the 
 problem ? The previous behaviour was not acceptable either because the 
 streams that came out of a memory filesystem were the only ones with binary 
 content
 
 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 Damien, Marcus
 
 this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
 the idea that reading characters should be default (one could argue about 
 it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
 characters are discarded! So if my ByteArray is a NULL terminated string, 
 for instance, I can not check for the NULL termination anymore.
 
 Cheers,
 Max
 
 
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-05 Thread Henrik Johansen

On 05 Dec 2013, at 7:48 , Max Leske maxle...@gmail.com wrote:

 Thanks for the input.
 
 The problem with ReadStream and WriteStream is that they (at least in 2.0 and 
 3.0) never supported switching in the first place. #binary and #ascii simply 
 answer self. That means the collection the stream operates on predetermines 
 the output. I’ve noticed a couple of different approaches to solving this 
 problem in the way that streams are used throughout the image and I want to 
 make a list of those to see if there’s some kind of pattern that maybe leads 
 to an acceptable version with little changes as opposed to an optimal version 
 where we have to change a lot of the implemenation details of streams.
 
 Cheers,
 Max

In my experience, the legacy solution for swapping binary/ascii writing to 
in-memory collections is using a (MultiByte)BinaryOrTextStream instead of a 
standard ReadWriteStream.

Cheers,
Henry

P.S. In fact, that class is pretty much the ultimate example of why the XTreams 
approach is superior.
It *looks* like it’s interchangeable with (MultiByte)FileStream, but the 
implementations are completely independent, and thus a subtle set of 
differences are bound to exist. (and do, but I can’t remember exactly which atm)
Maintaining and bug fixing either is a nightmare, and introducing further 
divergence bugs in the long run is pretty much inevitable.

Much better to composite singly implemented, single purpose wrappers streams on 
top of dumb terminals for each destination type to achieve desired input - 
output transformations.




signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-05 Thread Max Leske

On 05.12.2013, at 10:12, Henrik Johansen henrik.s.johan...@veloxit.no wrote:

 
 On 05 Dec 2013, at 7:48 , Max Leske maxle...@gmail.com wrote:
 
 Thanks for the input.
 
 The problem with ReadStream and WriteStream is that they (at least in 2.0 
 and 3.0) never supported switching in the first place. #binary and #ascii 
 simply answer self. That means the collection the stream operates on 
 predetermines the output. I’ve noticed a couple of different approaches to 
 solving this problem in the way that streams are used throughout the image 
 and I want to make a list of those to see if there’s some kind of pattern 
 that maybe leads to an acceptable version with little changes as opposed to 
 an optimal version where we have to change a lot of the implemenation 
 details of streams.
 
 Cheers,
 Max
 
 In my experience, the legacy solution for swapping binary/ascii writing to 
 in-memory collections is using a (MultiByte)BinaryOrTextStream instead of a 
 standard ReadWriteStream.

You’re right. I had thought about simply using RWBinaryOrTextStream instead of 
ReadStream in MemoryFileSystem but that felt like committing a crime… :)

 
 Cheers,
 Henry
 
 P.S. In fact, that class is pretty much the ultimate example of why the 
 XTreams approach is superior.
 It *looks* like it’s interchangeable with (MultiByte)FileStream, but the 
 implementations are completely independent, and thus a subtle set of 
 differences are bound to exist. (and do, but I can’t remember exactly which 
 atm)
 Maintaining and bug fixing either is a nightmare, and introducing further 
 divergence bugs in the long run is pretty much inevitable.
 
 Much better to composite singly implemented, single purpose wrappers streams 
 on top of dumb terminals for each destination type to achieve desired input 
 - output transformations.
 
 




Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-05 Thread Max Leske
There are several different approaches in different places:

- FileStream reads strings by default. #binary and #ascii switch between 
formats. File streams use an internal buffer which is either a String (default) 
or a ByteArray. It’s even possible to switch between binary and ascii midstream 
without losing information (if done right) because it only affects the buffer.
- ReadStream and WriteStream cannot change their format. Their behavior is 
determined by the underlying collection. Forcing conversions (e.g. by 
#asString) can lead to loss of information
- RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support 
the #binary #ascii method of switching format. Default is #ascii
- SocketStream uses the same #binary / #ascii mechanism. Default is #ascii
- ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is 
#binary (implicit); depends on the underlying stream

I think the pattern to follow is clear: ReadStream and WriteStream should allow 
switching format with #ascii and #binary, default should be #ascii. However, I 
suspect there’s a reason that these classes don’t support switching, namely 
that switching makes the implementation more complicated and also slower 
because more checks need to be made.

The easiest solution I see would be to implement something like this:

ReadStreamnext
^ self isBinary
ifTrue: [ self basicNext asCharacter ]
ifFalse: [ self basicNext ]

However, #next et al. are implemented in a plugin and the primitive method 
looks like this:

ReadStreamnext
primitive: 65 
position = readLimit 
ifTrue: [^nil] 
ifFalse: [^collection at: (position := position + 1)]

This means the collection instance variable has to hold either a binary or a 
string collection.

I’ve found a solution which would work and I’ve whipped up a working way 
(there’s space for improvement…):

ReadStreambinary
collection isString ifFalse: [ ^ self ].
collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: 
collection size with: collection

ReadStremascii
collection isString ifTrue: [ ^ self ].
collection := (String new: collection size) copyReplaceFrom: 1 to: 
collection size with: collection

@Damien
opposed to what I wrote earlier, #asString does *not* destroy non-printable 
characters. Instead, every byte (from 0 to 255) is encoded as a character and 
thus the string can be converted back to a ByteArray *without* loss of 
information. Sorry about that.

With this change in place the 12259 would become obsolete.

Please let me know what you think. This is a pretty big change that might have 
a lot of consequences in the image.

Cheers,
Max

On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote:

 Let me see what I can come up with.
 
 
 On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
 Thanks Max for the report. Do you have an idea on how we could solve the 
 problem ? The previous behaviour was not acceptable either because the 
 streams that came out of a memory filesystem were the only ones with binary 
 content
 
 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 Damien, Marcus
 
 this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
 the idea that reading characters should be default (one could argue about 
 it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
 characters are discarded! So if my ByteArray is a NULL terminated string, 
 for instance, I can not check for the NULL termination anymore.
 
 Cheers,
 Max
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-04 Thread Max Leske
Let me see what I can come up with.


On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:

 Thanks Max for the report. Do you have an idea on how we could solve the 
 problem ? The previous behaviour was not acceptable either because the 
 streams that came out of a memory filesystem were the only ones with binary 
 content
 
 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 Damien, Marcus
 
 this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
 the idea that reading characters should be default (one could argue about 
 it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
 characters are discarded! So if my ByteArray is a NULL terminated string, for 
 instance, I can not check for the NULL termination anymore.
 
 Cheers,
 Max



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-04 Thread Sven Van Caekenberghe

On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote:

 Let me see what I can come up with.

To be somewhat compatible with existing streams, your memory stream should have 
the concept of ‘i am binary or textual’ and be able to switch in-place between 
these two states (#binary, #ascii). Depending on its state it should then 
return bytes or characters (or collections thereof). Have a look at 
ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut:

To be 100% correct, encoding should be selectable. But maybe a default to utf-8 
would be enough. You could have a look at ZnCharacter[Read|Write]Stream for 
inspiration.

My 2c.

 On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
 Thanks Max for the report. Do you have an idea on how we could solve the 
 problem ? The previous behaviour was not acceptable either because the 
 streams that came out of a memory filesystem were the only ones with binary 
 content
 
 On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
 Damien, Marcus
 
 this change breaks a lot of things in FileSystem-Git. I don’t disagree with 
 the idea that reading characters should be default (one could argue about 
 it…) but your change makes it IMPOSSIBLE to read bytes because unprintable 
 characters are discarded! So if my ByteArray is a NULL terminated string, 
 for instance, I can not check for the NULL termination anymore.
 
 Cheers,
 Max
 




Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-04 Thread Nicolas Cellier
Here would be the Xtream way IMHO:
1) At the end, the file should be handled as binary.
2) A facade BivalentReadStream (and BivalentWriteStream) would be created,
able to switch encoding
3) The implementation would be to wrap either directly over the binary
stream, or indirectly over an encoded stream which wraps other the binary
stream.
4) For write, the facade should just ensure that buffers (if any) are
properly flushed, and then just delegate to specialized wrapped streams.
5) For read, it might be a bit more involved if some buffer were used in
intermediate wrappers, because they should push back the excess bytes read
and restore the state of lower level binary stream...
In all case, all should happen thru delegation and the facade should be
rather dumb.


2013/12/4 Sven Van Caekenberghe s...@stfx.eu


 On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote:

  Let me see what I can come up with.

 To be somewhat compatible with existing streams, your memory stream should
 have the concept of ‘i am binary or textual’ and be able to switch in-place
 between these two states (#binary, #ascii). Depending on its state it
 should then return bytes or characters (or collections thereof). Have a
 look at ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut:

 To be 100% correct, encoding should be selectable. But maybe a default to
 utf-8 would be enough. You could have a look at
 ZnCharacter[Read|Write]Stream for inspiration.

 My 2c.

  On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
  Thanks Max for the report. Do you have an idea on how we could solve
 the problem ? The previous behaviour was not acceptable either because the
 streams that came out of a memory filesystem were the only ones with binary
 content
 
  On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
  Damien, Marcus
 
  this change breaks a lot of things in FileSystem-Git. I don’t disagree
 with the idea that reading characters should be default (one could argue
 about it…) but your change makes it IMPOSSIBLE to read bytes because
 unprintable characters are discarded! So if my ByteArray is a NULL
 terminated string, for instance, I can not check for the NULL termination
 anymore.
 
  Cheers,
  Max
 





Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-04 Thread Max Leske
Thanks for the input.

The problem with ReadStream and WriteStream is that they (at least in 2.0 and 
3.0) never supported switching in the first place. #binary and #ascii simply 
answer self. That means the collection the stream operates on predetermines the 
output. I’ve noticed a couple of different approaches to solving this problem 
in the way that streams are used throughout the image and I want to make a list 
of those to see if there’s some kind of pattern that maybe leads to an 
acceptable version with little changes as opposed to an optimal version where 
we have to change a lot of the implemenation details of streams.

Cheers,
Max


On 04.12.2013, at 19:45, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com 
wrote:

 Here would be the Xtream way IMHO:
 1) At the end, the file should be handled as binary.
 2) A facade BivalentReadStream (and BivalentWriteStream) would be created, 
 able to switch encoding
 3) The implementation would be to wrap either directly over the binary 
 stream, or indirectly over an encoded stream which wraps other the binary 
 stream.
 4) For write, the facade should just ensure that buffers (if any) are 
 properly flushed, and then just delegate to specialized wrapped streams.
 5) For read, it might be a bit more involved if some buffer were used in 
 intermediate wrappers, because they should push back the excess bytes read 
 and restore the state of lower level binary stream...
 In all case, all should happen thru delegation and the facade should be 
 rather dumb.
 
 
 2013/12/4 Sven Van Caekenberghe s...@stfx.eu
 
 On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote:
 
  Let me see what I can come up with.
 
 To be somewhat compatible with existing streams, your memory stream should 
 have the concept of ‘i am binary or textual’ and be able to switch in-place 
 between these two states (#binary, #ascii). Depending on its state it should 
 then return bytes or characters (or collections thereof). Have a look at 
 ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut:
 
 To be 100% correct, encoding should be selectable. But maybe a default to 
 utf-8 would be enough. You could have a look at ZnCharacter[Read|Write]Stream 
 for inspiration.
 
 My 2c.
 
  On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
  Thanks Max for the report. Do you have an idea on how we could solve the 
  problem ? The previous behaviour was not acceptable either because the 
  streams that came out of a memory filesystem were the only ones with 
  binary content
 
  On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
  Damien, Marcus
 
  this change breaks a lot of things in FileSystem-Git. I don’t disagree 
  with the idea that reading characters should be default (one could argue 
  about it…) but your change makes it IMPOSSIBLE to read bytes because 
  unprintable characters are discarded! So if my ByteArray is a NULL 
  terminated string, for instance, I can not check for the NULL termination 
  anymore.
 
  Cheers,
  Max
 
 
 
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-04 Thread Stéphane Ducasse

On Dec 5, 2013, at 7:48 AM, Max Leske maxle...@gmail.com wrote:

 Thanks for the input.
 
 The problem with ReadStream and WriteStream is that they (at least in 2.0 and 
 3.0) never supported switching in the first place. #binary and #ascii simply 
 answer self. That means the collection the stream operates on predetermines 
 the output. I’ve noticed a couple of different approaches to solving this 
 problem in the way that streams are used throughout the image and I want to 
 make a list of those to see if there’s some kind of pattern that maybe leads 
 to an acceptable version with little changes as opposed to an optimal version 
 where we have to change a lot of the implemenation details of streams.

thanks such analysis is indeed important.

 
 Cheers,
 Max
 
 
 On 04.12.2013, at 19:45, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com 
 wrote:
 
 Here would be the Xtream way IMHO:
 1) At the end, the file should be handled as binary.
 2) A facade BivalentReadStream (and BivalentWriteStream) would be created, 
 able to switch encoding
 3) The implementation would be to wrap either directly over the binary 
 stream, or indirectly over an encoded stream which wraps other the binary 
 stream.
 4) For write, the facade should just ensure that buffers (if any) are 
 properly flushed, and then just delegate to specialized wrapped streams.
 5) For read, it might be a bit more involved if some buffer were used in 
 intermediate wrappers, because they should push back the excess bytes read 
 and restore the state of lower level binary stream...
 In all case, all should happen thru delegation and the facade should be 
 rather dumb.
 
 
 2013/12/4 Sven Van Caekenberghe s...@stfx.eu
 
 On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote:
 
  Let me see what I can come up with.
 
 To be somewhat compatible with existing streams, your memory stream should 
 have the concept of ‘i am binary or textual’ and be able to switch in-place 
 between these two states (#binary, #ascii). Depending on its state it should 
 then return bytes or characters (or collections thereof). Have a look at 
 ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut:
 
 To be 100% correct, encoding should be selectable. But maybe a default to 
 utf-8 would be enough. You could have a look at 
 ZnCharacter[Read|Write]Stream for inspiration.
 
 My 2c.
 
  On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote:
 
  Thanks Max for the report. Do you have an idea on how we could solve the 
  problem ? The previous behaviour was not acceptable either because the 
  streams that came out of a memory filesystem were the only ones with 
  binary content
 
  On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:
  Damien, Marcus
 
  this change breaks a lot of things in FileSystem-Git. I don’t disagree 
  with the idea that reading characters should be default (one could argue 
  about it…) but your change makes it IMPOSSIBLE to read bytes because 
  unprintable characters are discarded! So if my ByteArray is a NULL 
  terminated string, for instance, I can not check for the NULL termination 
  anymore.
 
  Cheers,
  Max
 
 
 
 
 



Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default

2013-12-03 Thread Damien Cassou
Thanks Max for the report. Do you have an idea on how we could solve the
problem ? The previous behaviour was not acceptable either because the
streams that came out of a memory filesystem were the only ones with binary
content
On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote:

 Damien, Marcus

 this change breaks a lot of things in FileSystem-Git. I don’t disagree
 with the idea that reading characters should be default (one could argue
 about it…) but your change makes it IMPOSSIBLE to read bytes because
 unprintable characters are discarded! So if my ByteArray is a NULL
 terminated string, for instance, I can not check for the NULL termination
 anymore.

 Cheers,
 Max