Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
So in favor of not modifying ReadStream I implemented your suggestion #binaryReadStream in Filesystem (will not work on all streams now but that’s ok for me until we get Xstreams). Please take a look and let me know what you think (ReadStream would have been 2 methods only vs. ~6 :) ). Name: SLICE-Issue-12259-FileSystem-memory-readswrites-using-a-binary-stream-by-default-MaxLeske.1 Author: MaxLeske Time: 25 January 2014, 11:38:48.027093 pm UUID: 1248d0f9-f094-400f-a736-23fd41d12e50 Ancestors: Dependencies: FileSystem-Tests-Core-MaxLeske.67, FileSystem-Memory-MaxLeske.46, FileSystem-Disk-MaxLeske.72, FileSystem-Core-MaxLeske.139 * added #binaryReadStream and friends to FileSystem (for compatibility) Cheers, Max On 10.12.2013, at 22:19, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com wrote: The more you add, the more you'll have to remove 2013/12/10 Damien Cassou damien.cas...@gmail.com On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote: There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. Thank you very much for your analysis. I'm a bit reluctant to change ReadStream now, but I could be ok with enough unit tests. Another potential solution: What about adding a #binaryReadStream method to the memory file system as a workaround before the introduction of xstreams ? With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote: There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. Thank you very much for your analysis. I'm a bit reluctant to change ReadStream now, but I could be ok with enough unit tests. Another potential solution: What about adding a #binaryReadStream method to the memory file system as a workaround before the introduction of xstreams ? With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
The more you add, the more you'll have to remove 2013/12/10 Damien Cassou damien.cas...@gmail.com On Dec 5, 2013 10:50 PM, Max Leske maxle...@gmail.com wrote: There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. Thank you very much for your analysis. I'm a bit reluctant to change ReadStream now, but I could be ok with enough unit tests. Another potential solution: What about adding a #binaryReadStream method to the memory file system as a workaround before the introduction of xstreams ? With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
Hem, switching #ascii - #binary does only make sense in... ASCII With every other encoding, it's not something that makes sense at all, or maybe #latin1 - #binary, #utf8 - #binary, #utf16 - #binary 2013/12/5 Max Leske maxle...@gmail.com There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
I agree but that is a problem inherent to the current implementation and it’s not really my goal now to fix all the shortcomings :) I simply want a consistent way to get through this (since I’ve heard that the streams might be replaced with Xtreams…). On 07.12.2013, at 00:44, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com wrote: Hem, switching #ascii - #binary does only make sense in... ASCII With every other encoding, it's not something that makes sense at all, or maybe #latin1 - #binary, #utf8 - #binary, #utf16 - #binary 2013/12/5 Max Leske maxle...@gmail.com There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
On 05 Dec 2013, at 7:48 , Max Leske maxle...@gmail.com wrote: Thanks for the input. The problem with ReadStream and WriteStream is that they (at least in 2.0 and 3.0) never supported switching in the first place. #binary and #ascii simply answer self. That means the collection the stream operates on predetermines the output. I’ve noticed a couple of different approaches to solving this problem in the way that streams are used throughout the image and I want to make a list of those to see if there’s some kind of pattern that maybe leads to an acceptable version with little changes as opposed to an optimal version where we have to change a lot of the implemenation details of streams. Cheers, Max In my experience, the legacy solution for swapping binary/ascii writing to in-memory collections is using a (MultiByte)BinaryOrTextStream instead of a standard ReadWriteStream. Cheers, Henry P.S. In fact, that class is pretty much the ultimate example of why the XTreams approach is superior. It *looks* like it’s interchangeable with (MultiByte)FileStream, but the implementations are completely independent, and thus a subtle set of differences are bound to exist. (and do, but I can’t remember exactly which atm) Maintaining and bug fixing either is a nightmare, and introducing further divergence bugs in the long run is pretty much inevitable. Much better to composite singly implemented, single purpose wrappers streams on top of dumb terminals for each destination type to achieve desired input - output transformations. signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
On 05.12.2013, at 10:12, Henrik Johansen henrik.s.johan...@veloxit.no wrote: On 05 Dec 2013, at 7:48 , Max Leske maxle...@gmail.com wrote: Thanks for the input. The problem with ReadStream and WriteStream is that they (at least in 2.0 and 3.0) never supported switching in the first place. #binary and #ascii simply answer self. That means the collection the stream operates on predetermines the output. I’ve noticed a couple of different approaches to solving this problem in the way that streams are used throughout the image and I want to make a list of those to see if there’s some kind of pattern that maybe leads to an acceptable version with little changes as opposed to an optimal version where we have to change a lot of the implemenation details of streams. Cheers, Max In my experience, the legacy solution for swapping binary/ascii writing to in-memory collections is using a (MultiByte)BinaryOrTextStream instead of a standard ReadWriteStream. You’re right. I had thought about simply using RWBinaryOrTextStream instead of ReadStream in MemoryFileSystem but that felt like committing a crime… :) Cheers, Henry P.S. In fact, that class is pretty much the ultimate example of why the XTreams approach is superior. It *looks* like it’s interchangeable with (MultiByte)FileStream, but the implementations are completely independent, and thus a subtle set of differences are bound to exist. (and do, but I can’t remember exactly which atm) Maintaining and bug fixing either is a nightmare, and introducing further divergence bugs in the long run is pretty much inevitable. Much better to composite singly implemented, single purpose wrappers streams on top of dumb terminals for each destination type to achieve desired input - output transformations.
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
There are several different approaches in different places: - FileStream reads strings by default. #binary and #ascii switch between formats. File streams use an internal buffer which is either a String (default) or a ByteArray. It’s even possible to switch between binary and ascii midstream without losing information (if done right) because it only affects the buffer. - ReadStream and WriteStream cannot change their format. Their behavior is determined by the underlying collection. Forcing conversions (e.g. by #asString) can lead to loss of information - RWBinaryOrTextStream (and other subclasses of ReadWriteStream) also support the #binary #ascii method of switching format. Default is #ascii - SocketStream uses the same #binary / #ascii mechanism. Default is #ascii - ZnLimitedReadStream uses the same #binary / #ascii mechanism. Default is #binary (implicit); depends on the underlying stream I think the pattern to follow is clear: ReadStream and WriteStream should allow switching format with #ascii and #binary, default should be #ascii. However, I suspect there’s a reason that these classes don’t support switching, namely that switching makes the implementation more complicated and also slower because more checks need to be made. The easiest solution I see would be to implement something like this: ReadStreamnext ^ self isBinary ifTrue: [ self basicNext asCharacter ] ifFalse: [ self basicNext ] However, #next et al. are implemented in a plugin and the primitive method looks like this: ReadStreamnext primitive: 65 position = readLimit ifTrue: [^nil] ifFalse: [^collection at: (position := position + 1)] This means the collection instance variable has to hold either a binary or a string collection. I’ve found a solution which would work and I’ve whipped up a working way (there’s space for improvement…): ReadStreambinary collection isString ifFalse: [ ^ self ]. collection := (ByteArray new: collection size) copyReplaceFrom: 1 to: collection size with: collection ReadStremascii collection isString ifTrue: [ ^ self ]. collection := (String new: collection size) copyReplaceFrom: 1 to: collection size with: collection @Damien opposed to what I wrote earlier, #asString does *not* destroy non-printable characters. Instead, every byte (from 0 to 255) is encoded as a character and thus the string can be converted back to a ByteArray *without* loss of information. Sorry about that. With this change in place the 12259 would become obsolete. Please let me know what you think. This is a pretty big change that might have a lot of consequences in the image. Cheers, Max On 04.12.2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
Let me see what I can come up with. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. To be somewhat compatible with existing streams, your memory stream should have the concept of ‘i am binary or textual’ and be able to switch in-place between these two states (#binary, #ascii). Depending on its state it should then return bytes or characters (or collections thereof). Have a look at ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut: To be 100% correct, encoding should be selectable. But maybe a default to utf-8 would be enough. You could have a look at ZnCharacter[Read|Write]Stream for inspiration. My 2c. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
Here would be the Xtream way IMHO: 1) At the end, the file should be handled as binary. 2) A facade BivalentReadStream (and BivalentWriteStream) would be created, able to switch encoding 3) The implementation would be to wrap either directly over the binary stream, or indirectly over an encoded stream which wraps other the binary stream. 4) For write, the facade should just ensure that buffers (if any) are properly flushed, and then just delegate to specialized wrapped streams. 5) For read, it might be a bit more involved if some buffer were used in intermediate wrappers, because they should push back the excess bytes read and restore the state of lower level binary stream... In all case, all should happen thru delegation and the facade should be rather dumb. 2013/12/4 Sven Van Caekenberghe s...@stfx.eu On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. To be somewhat compatible with existing streams, your memory stream should have the concept of ‘i am binary or textual’ and be able to switch in-place between these two states (#binary, #ascii). Depending on its state it should then return bytes or characters (or collections thereof). Have a look at ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut: To be 100% correct, encoding should be selectable. But maybe a default to utf-8 would be enough. You could have a look at ZnCharacter[Read|Write]Stream for inspiration. My 2c. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
Thanks for the input. The problem with ReadStream and WriteStream is that they (at least in 2.0 and 3.0) never supported switching in the first place. #binary and #ascii simply answer self. That means the collection the stream operates on predetermines the output. I’ve noticed a couple of different approaches to solving this problem in the way that streams are used throughout the image and I want to make a list of those to see if there’s some kind of pattern that maybe leads to an acceptable version with little changes as opposed to an optimal version where we have to change a lot of the implemenation details of streams. Cheers, Max On 04.12.2013, at 19:45, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com wrote: Here would be the Xtream way IMHO: 1) At the end, the file should be handled as binary. 2) A facade BivalentReadStream (and BivalentWriteStream) would be created, able to switch encoding 3) The implementation would be to wrap either directly over the binary stream, or indirectly over an encoded stream which wraps other the binary stream. 4) For write, the facade should just ensure that buffers (if any) are properly flushed, and then just delegate to specialized wrapped streams. 5) For read, it might be a bit more involved if some buffer were used in intermediate wrappers, because they should push back the excess bytes read and restore the state of lower level binary stream... In all case, all should happen thru delegation and the facade should be rather dumb. 2013/12/4 Sven Van Caekenberghe s...@stfx.eu On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. To be somewhat compatible with existing streams, your memory stream should have the concept of ‘i am binary or textual’ and be able to switch in-place between these two states (#binary, #ascii). Depending on its state it should then return bytes or characters (or collections thereof). Have a look at ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut: To be 100% correct, encoding should be selectable. But maybe a default to utf-8 would be enough. You could have a look at ZnCharacter[Read|Write]Stream for inspiration. My 2c. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
On Dec 5, 2013, at 7:48 AM, Max Leske maxle...@gmail.com wrote: Thanks for the input. The problem with ReadStream and WriteStream is that they (at least in 2.0 and 3.0) never supported switching in the first place. #binary and #ascii simply answer self. That means the collection the stream operates on predetermines the output. I’ve noticed a couple of different approaches to solving this problem in the way that streams are used throughout the image and I want to make a list of those to see if there’s some kind of pattern that maybe leads to an acceptable version with little changes as opposed to an optimal version where we have to change a lot of the implemenation details of streams. thanks such analysis is indeed important. Cheers, Max On 04.12.2013, at 19:45, Nicolas Cellier nicolas.cellier.aka.n...@gmail.com wrote: Here would be the Xtream way IMHO: 1) At the end, the file should be handled as binary. 2) A facade BivalentReadStream (and BivalentWriteStream) would be created, able to switch encoding 3) The implementation would be to wrap either directly over the binary stream, or indirectly over an encoded stream which wraps other the binary stream. 4) For write, the facade should just ensure that buffers (if any) are properly flushed, and then just delegate to specialized wrapped streams. 5) For read, it might be a bit more involved if some buffer were used in intermediate wrappers, because they should push back the excess bytes read and restore the state of lower level binary stream... In all case, all should happen thru delegation and the facade should be rather dumb. 2013/12/4 Sven Van Caekenberghe s...@stfx.eu On 04 Dec 2013, at 13:14, Max Leske maxle...@gmail.com wrote: Let me see what I can come up with. To be somewhat compatible with existing streams, your memory stream should have the concept of ‘i am binary or textual’ and be able to switch in-place between these two states (#binary, #ascii). Depending on its state it should then return bytes or characters (or collections thereof). Have a look at ZnLimitedReadStream#next and/or ZnBivalentWriteStream#nextPut: To be 100% correct, encoding should be selectable. But maybe a default to utf-8 would be enough. You could have a look at ZnCharacter[Read|Write]Stream for inspiration. My 2c. On 03.12.2013, at 19:36, Damien Cassou damien.cas...@gmail.com wrote: Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max
Re: [Pharo-dev] 12259: FileSystem memory reads writesusing a binary stream by default
Thanks Max for the report. Do you have an idea on how we could solve the problem ? The previous behaviour was not acceptable either because the streams that came out of a memory filesystem were the only ones with binary content On Dec 3, 2013 5:35 PM, Max Leske maxle...@gmail.com wrote: Damien, Marcus this change breaks a lot of things in FileSystem-Git. I don’t disagree with the idea that reading characters should be default (one could argue about it…) but your change makes it IMPOSSIBLE to read bytes because unprintable characters are discarded! So if my ByteArray is a NULL terminated string, for instance, I can not check for the NULL termination anymore. Cheers, Max