Alistair, First off, thanks for the discussions and your contributions, I really appreciate them.
But I want to have a discussion at the high level of the definition and semantics of the stream API in Pharo. > On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote: > > On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote: >> Playing a bit devil's advocate, the idea is that, in general, >> >> [ stream atEnd] whileFalse: [ stream next. "..." ]. >> >> is no longer allowed ? > > It hasn't been allowed "forever" [1]. It's just been misused for > almost as long. > > [1] Time began when stdio stream support was introduced. :-) I am still not convinced. Another way to put it would be that the old #atEnd or #upToEnd do not make sense for these streams and some new loop is needed, based on a new test (it exists for socket streams already). [ stream isDataAvailable ] whileTrue: [ stream next ] >> And you want to replace it with >> >> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue. >> >> That is a pretty big change, no ? > > That's the way quite a bit of code already operates. > > As Denis pointed out, it's obviously problematic in the general sense, > since nil can be embedded in non-byte oriented streams. I suspect > that in practice not many people write code that reads streams from > both byte oriented and non-byte oriented streams. Maybe yes, maybe no. As Denis' example shows there is a clear definition problem. And I do use streams of byte arrays or strings all the time, this is really important. I want my parsers to work on all kinds of streams. >> I think/feel like a proper EOF exception would be better, more correct. >> >> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue. > > I agree, but the email thread Nicolas pointed to raises some > performance questions about this approach. It should be > straightforward to do a basic performance comparison which I'll get > around to if other objections aren't raised. Reading in bigger blocks, using #readInto:startingAt:count: (which is basically Unix's (2) Read sys call), would solve performance problems, I think. >> Will we throw away #atEnd then ? Do we need it if we cannot use it ? > > Unix file i/o returns EOF if the end of file has been reach OR if an > error occurs. You should still check #atEnd after reading past the > end of the file to make sure no error occurred. Another part of the > primitive change I'm proposing is to return additional information > about what went wrong in the event of an error. I am sorry, but this kind of semantics (the OR) is way too complex at the general image level, it is too specific and based on certain underlying implementation details. Sven > We could modify the read primitive so that it fails if an error has > occurred, and then #atEnd wouldn't be required. > > Cheers, > Alistair > > > >>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote: >>> >>> Hi Nicolas, >>> >>> On 4 April 2018 at 12:36, Nicolas Cellier >>> <nicolas.cellier.aka.n...@gmail.com> wrote: >>>> >>>> >>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>: >>>>> >>>>> Hi Sven, >>>>> >>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote: >>>>>> Somehow, somewhere there was a change to the implementation of the >>>>>> primitive called by some streams' #atEnd. >>>>> >>>>> That's a proposed change by me, but it hasn't been integrated yet. So >>>>> the discussion below should apply to the current stable vm (from August >>>>> last year). >>>>> >>>>> >>>>>> IIRC, someone said it is implemented as 'remaining size being zero' >>>>>> and some virtual unix files like /dev/random are zero sized. >>>>> >>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is >>>>> effectively defined as: >>>>> >>>>> atEnd := stream position >= stream size >>>>> >>>>> >>>>> And, as you say, plenty of virtual unix files report size 0. >>>>> >>>>> >>>>> >>>>>> Now, all kinds of changes are being done image size to work around this. >>>>> >>>>> I would phrase this slightly differently :-) >>>>> >>>>> Some code does the right thing, while other code doesn't. E.g.: >>>>> >>>>> MultiByteFileStream>>upToEnd is good, while >>>>> FileStream>>contents is incorrect >>>>> >>>>> >>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I >>>>>> am not sure we are doing the right thing here. >>>>>> >>>>>> Point is, I am not sure #next returning nil is official and universal. >>>>>> >>>>>> Consider the comments: >>>>>> >>>>>> Stream>>#next >>>>>> "Answer the next object accessible by the receiver." >>>>>> >>>>>> ReadStream>>#next >>>>>> "Primitive. Answer the next object in the Stream represented by the >>>>>> receiver. Fail if the collection of this stream is not an Array or a >>>>>> String. >>>>>> Fail if the stream is positioned at its end, or if the position is out >>>>>> of >>>>>> bounds in the collection. Optional. See Object documentation >>>>>> whatIsAPrimitive." >>>>>> >>>>>> Note how there is no talk about returning nil ! >>>>>> >>>>>> I think we should discuss about this first. >>>>>> >>>>>> Was the low level change really correct and the right thing to do ? >>>>> >>>>> The primitive change proposed doesn't affect this discussion. It will >>>>> mean that #atEnd returns false (correctly) sometimes, while currently it >>>>> returns true (incorrectly). The end result is still incorrect, e.g. >>>>> #contents returns an empty string for /proc/cpuinfo. >>>>> >>>>> You're correct about no mention of nil, but we have: >>>>> >>>>> FileStream>>next >>>>> >>>>> (position >= readLimit and: [self atEnd]) >>>>> ifTrue: [^nil] >>>>> ifFalse: [^collection at: (position := position + 1)] >>>>> >>>>> >>>>> which has been around for a long time (I suspect, before Pharo existed). >>>>> >>>>> Having said that, I think that raising an exception is a better >>>>> solution, but it is a much, much bigger change than the one I proposed >>>>> in https://github.com/pharo-project/pharo/pull/1180. >>>>> >>>>> >>>>> Cheers, >>>>> Alistair >>>>> >>>> >>>> Hi, >>>> yes, if you are after universal behavior englobing Unix streams, the >>>> Exception might be the best way. >>>> Because on special stream you can't allways say in advance, you have to >>>> try. >>>> That's the solution adopted by authors of Xtreams. >>>> But there is a runtime penalty associated to it. >>>> >>>> The penalty once was so high that my proposal to generalize EndOfStream >>>> usage was rejected a few years ago by AndreaRaab. >>>> http://forum.world.st/EndOfStream-unused-td68806.html >>> >>> Thanks for this, I'll definitely take a look. >>> >>> Do you have a sense of how Denis' suggestion of using an EndOfStream >>> object would compare? >>> >>> It would keep the same coding style, but avoid the problems with nil. >>> >>> Thanks, >>> Alistair >>> >>> >>> >>>> I have regularly benched Xtreams, but stopped a few years ago. >>>> Maybe i can excavate and pass on newer VM. >>>> >>>> In the mean time, i had experimented a programmable end of stream behavior >>>> (via a block, or any other valuable) >>>> http://www.squeaksource.com/XTream.htm >>>> so as to reconcile performance and universality, but it was a source of >>>> complexification at implementation side. >>>> >>>> Nicolas >>>> >>>>> >>>>> >>>>>> Note also that a Guille introduced something new, #closed which is >>>>>> related to the difference between having no more elements (maybe right >>>>>> now, >>>>>> like an open network stream) and never ever being able to produce more >>>>>> data. >>>>>> >>>>>> Sven >> >> >