First a quick update: After doing some work on primitiveFileAtEnd, #atEnd now answers correctly for files that don't report their size correctly, e.g. /dev/urandom and /proc/cpuinfo, whether the files are opened directly or redirected through stdin.
However determining whether stdin from a terminal has reached the end of file can't be done without making #atEnd blocking since we have to wait for the user to flag the end of file, e.g. by typing Ctrl-D. And #atEnd is assumed to be non-blocking. So currently using ZnCharacterReadStream with stdin from a terminal will result in a stack dump similar to: MessageNotUnderstood: receiver of "<" is nil UndefinedObject(Object)>>doesNotUnderstand: #< ZnUTF8Encoder>>nextCodePointFromStream: ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: ZnCharacterReadStream>>nextElement ZnCharacterReadStream(ZnEncodedReadStream)>>next UndefinedObject>>DoIt Going back through the various suggestions that have been made regarding using a sentinel object vs. raising a notification / exception, my (still to be polished) suggestion is to: 1. Add an endOfStream instance variable 2. When the end of the stream is reached answer the value of the instance variable (i.e. the result of sending #value to the variable). 3. The initial default value would be a block that raises a Deprecation warning and then returns nil. This would allow existing code to function for a changeover period. 4. At the end of the deprecation period the default value would be changed to a unique sentinel object which would answer itself as its #value. At any time users of the stream can set their own sentinel, including a block that raises an exception. Cheers, Alistair On 4 April 2018 at 19:24, Stephane Ducasse <[email protected]> wrote: > Thanks for this discussion. > > On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <[email protected]> wrote: >> Alistair, >> >> First off, thanks for the discussions and your contributions, I really >> appreciate them. >> >> But I want to have a discussion at the high level of the definition and >> semantics of the stream API in Pharo. >> >>> On 4 Apr 2018, at 13:20, Alistair Grant <[email protected]> wrote: >>> >>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <[email protected]> wrote: >>>> Playing a bit devil's advocate, the idea is that, in general, >>>> >>>> [ stream atEnd] whileFalse: [ stream next. "..." ]. >>>> >>>> is no longer allowed ? >>> >>> It hasn't been allowed "forever" [1]. It's just been misused for >>> almost as long. >>> >>> [1] Time began when stdio stream support was introduced. :-) >> >> I am still not convinced. Another way to put it would be that the old #atEnd >> or #upToEnd do not make sense for these streams and some new loop is needed, >> based on a new test (it exists for socket streams already). >> >> [ stream isDataAvailable ] whileTrue: [ stream next ] >> >>>> And you want to replace it with >>>> >>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue. >>>> >>>> That is a pretty big change, no ? >>> >>> That's the way quite a bit of code already operates. >>> >>> As Denis pointed out, it's obviously problematic in the general sense, >>> since nil can be embedded in non-byte oriented streams. I suspect >>> that in practice not many people write code that reads streams from >>> both byte oriented and non-byte oriented streams. >> >> Maybe yes, maybe no. As Denis' example shows there is a clear definition >> problem. >> >> And I do use streams of byte arrays or strings all the time, this is really >> important. I want my parsers to work on all kinds of streams. >> >>>> I think/feel like a proper EOF exception would be better, more correct. >>>> >>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue. >>> >>> I agree, but the email thread Nicolas pointed to raises some >>> performance questions about this approach. It should be >>> straightforward to do a basic performance comparison which I'll get >>> around to if other objections aren't raised. >> >> Reading in bigger blocks, using #readInto:startingAt:count: (which is >> basically Unix's (2) Read sys call), would solve performance problems, I >> think. >> >>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ? >>> >>> Unix file i/o returns EOF if the end of file has been reach OR if an >>> error occurs. You should still check #atEnd after reading past the >>> end of the file to make sure no error occurred. Another part of the >>> primitive change I'm proposing is to return additional information >>> about what went wrong in the event of an error. >> >> I am sorry, but this kind of semantics (the OR) is way too complex at the >> general image level, it is too specific and based on certain underlying >> implementation details. >> >> Sven >> >>> We could modify the read primitive so that it fails if an error has >>> occurred, and then #atEnd wouldn't be required. >>> >>> Cheers, >>> Alistair >>> >>> >>> >>>>> On 4 Apr 2018, at 12:41, Alistair Grant <[email protected]> wrote: >>>>> >>>>> Hi Nicolas, >>>>> >>>>> On 4 April 2018 at 12:36, Nicolas Cellier >>>>> <[email protected]> wrote: >>>>>> >>>>>> >>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <[email protected]>: >>>>>>> >>>>>>> Hi Sven, >>>>>>> >>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote: >>>>>>>> Somehow, somewhere there was a change to the implementation of the >>>>>>>> primitive called by some streams' #atEnd. >>>>>>> >>>>>>> That's a proposed change by me, but it hasn't been integrated yet. So >>>>>>> the discussion below should apply to the current stable vm (from August >>>>>>> last year). >>>>>>> >>>>>>> >>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero' >>>>>>>> and some virtual unix files like /dev/random are zero sized. >>>>>>> >>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is >>>>>>> effectively defined as: >>>>>>> >>>>>>> atEnd := stream position >= stream size >>>>>>> >>>>>>> >>>>>>> And, as you say, plenty of virtual unix files report size 0. >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Now, all kinds of changes are being done image size to work around >>>>>>>> this. >>>>>>> >>>>>>> I would phrase this slightly differently :-) >>>>>>> >>>>>>> Some code does the right thing, while other code doesn't. E.g.: >>>>>>> >>>>>>> MultiByteFileStream>>upToEnd is good, while >>>>>>> FileStream>>contents is incorrect >>>>>>> >>>>>>> >>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I >>>>>>>> am not sure we are doing the right thing here. >>>>>>>> >>>>>>>> Point is, I am not sure #next returning nil is official and universal. >>>>>>>> >>>>>>>> Consider the comments: >>>>>>>> >>>>>>>> Stream>>#next >>>>>>>> "Answer the next object accessible by the receiver." >>>>>>>> >>>>>>>> ReadStream>>#next >>>>>>>> "Primitive. Answer the next object in the Stream represented by the >>>>>>>> receiver. Fail if the collection of this stream is not an Array or a >>>>>>>> String. >>>>>>>> Fail if the stream is positioned at its end, or if the position is out >>>>>>>> of >>>>>>>> bounds in the collection. Optional. See Object documentation >>>>>>>> whatIsAPrimitive." >>>>>>>> >>>>>>>> Note how there is no talk about returning nil ! >>>>>>>> >>>>>>>> I think we should discuss about this first. >>>>>>>> >>>>>>>> Was the low level change really correct and the right thing to do ? >>>>>>> >>>>>>> The primitive change proposed doesn't affect this discussion. It will >>>>>>> mean that #atEnd returns false (correctly) sometimes, while currently it >>>>>>> returns true (incorrectly). The end result is still incorrect, e.g. >>>>>>> #contents returns an empty string for /proc/cpuinfo. >>>>>>> >>>>>>> You're correct about no mention of nil, but we have: >>>>>>> >>>>>>> FileStream>>next >>>>>>> >>>>>>> (position >= readLimit and: [self atEnd]) >>>>>>> ifTrue: [^nil] >>>>>>> ifFalse: [^collection at: (position := position + 1)] >>>>>>> >>>>>>> >>>>>>> which has been around for a long time (I suspect, before Pharo existed). >>>>>>> >>>>>>> Having said that, I think that raising an exception is a better >>>>>>> solution, but it is a much, much bigger change than the one I proposed >>>>>>> in https://github.com/pharo-project/pharo/pull/1180. >>>>>>> >>>>>>> >>>>>>> Cheers, >>>>>>> Alistair >>>>>>> >>>>>> >>>>>> Hi, >>>>>> yes, if you are after universal behavior englobing Unix streams, the >>>>>> Exception might be the best way. >>>>>> Because on special stream you can't allways say in advance, you have to >>>>>> try. >>>>>> That's the solution adopted by authors of Xtreams. >>>>>> But there is a runtime penalty associated to it. >>>>>> >>>>>> The penalty once was so high that my proposal to generalize EndOfStream >>>>>> usage was rejected a few years ago by AndreaRaab. >>>>>> http://forum.world.st/EndOfStream-unused-td68806.html >>>>> >>>>> Thanks for this, I'll definitely take a look. >>>>> >>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream >>>>> object would compare? >>>>> >>>>> It would keep the same coding style, but avoid the problems with nil. >>>>> >>>>> Thanks, >>>>> Alistair >>>>> >>>>> >>>>> >>>>>> I have regularly benched Xtreams, but stopped a few years ago. >>>>>> Maybe i can excavate and pass on newer VM. >>>>>> >>>>>> In the mean time, i had experimented a programmable end of stream >>>>>> behavior >>>>>> (via a block, or any other valuable) >>>>>> http://www.squeaksource.com/XTream.htm >>>>>> so as to reconcile performance and universality, but it was a source of >>>>>> complexification at implementation side. >>>>>> >>>>>> Nicolas >>>>>> >>>>>>> >>>>>>> >>>>>>>> Note also that a Guille introduced something new, #closed which is >>>>>>>> related to the difference between having no more elements (maybe right >>>>>>>> now, >>>>>>>> like an open network stream) and never ever being able to produce more >>>>>>>> data. >>>>>>>> >>>>>>>> Sven >>>> >>>> >>> >> >> >
