Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Sven Van Caekenberghe Wed, 04 Apr 2018 04:37:58 -0700

Alistair,

First off, thanks for the discussions and your contributions, I really 
appreciate them.


But I want to have a discussion at the high level of the definition and 
semantics of the stream API in Pharo.

> On 4 Apr 2018, at 13:20, Alistair Grant <[email protected]> wrote:
> 
> On 4 April 2018 at 12:56, Sven Van Caekenberghe <[email protected]> wrote:
>> Playing a bit devil's advocate, the idea is that, in general,
>> 
>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>> 
>> is no longer allowed ?
> 
> It hasn't been allowed "forever" [1].  It's just been misused for
> almost as long.
> 
> [1] Time began when stdio stream support was introduced. :-)

I am still not convinced. Another way to put it would be that the old #atEnd or 
#upToEnd do not make sense for these streams and some new loop is needed, based 
on a new test (it exists for socket streams already).

[ stream isDataAvailable ] whileTrue: [ stream next ]

>> And you want to replace it with
>> 
>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue.
>> 
>> That is a pretty big change, no ?
> 
> That's the way quite a bit of code already operates.
> 
> As Denis pointed out, it's obviously problematic in the general sense,
> since nil can be embedded in non-byte oriented streams.  I suspect
> that in practice not many people write code that reads streams from
> both byte oriented and non-byte oriented streams.

Maybe yes, maybe no. As Denis' example shows there is a clear definition 
problem.

And I do use streams of byte arrays or strings all the time, this is really 
important. I want my parsers to work on all kinds of streams.

>> I think/feel like a proper EOF exception would be better, more correct.
>> 
>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
> 
> I agree, but the email thread Nicolas pointed to raises some
> performance questions about this approach.  It should be
> straightforward to do a basic performance comparison which I'll get
> around to if other objections aren't raised.

Reading in bigger blocks, using #readInto:startingAt:count: (which is basically 
Unix's (2) Read sys call), would solve performance problems, I think.

>> Will we throw away #atEnd then ? Do we need it if we cannot use it ?
> 
> Unix file i/o returns EOF if the end of file has been reach OR if an
> error occurs.  You should still check #atEnd after reading past the
> end of the file to make sure no error occurred.  Another part of the
> primitive change I'm proposing is to return additional information
> about what went wrong in the event of an error.

I am sorry, but this kind of semantics (the OR) is way too complex at the 
general image level, it is too specific and based on certain underlying 
implementation details.

Sven

> We could modify the read primitive so that it fails if an error has
> occurred, and then #atEnd wouldn't be required.
> 
> Cheers,
> Alistair
> 
> 
> 
>>> On 4 Apr 2018, at 12:41, Alistair Grant <[email protected]> wrote:
>>> 
>>> Hi Nicolas,
>>> 
>>> On 4 April 2018 at 12:36, Nicolas Cellier
>>> <[email protected]> wrote:
>>>> 
>>>> 
>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <[email protected]>:
>>>>> 
>>>>> Hi Sven,
>>>>> 
>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote:
>>>>>> Somehow, somewhere there was a change to the implementation of the
>>>>>> primitive called by some streams' #atEnd.
>>>>> 
>>>>> That's a proposed change by me, but it hasn't been integrated yet.  So
>>>>> the discussion below should apply to the current stable vm (from August
>>>>> last year).
>>>>> 
>>>>> 
>>>>>> IIRC, someone said it is implemented as 'remaining size being zero'
>>>>>> and some virtual unix files like /dev/random are zero sized.
>>>>> 
>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is
>>>>> effectively defined as:
>>>>> 
>>>>> atEnd := stream position >= stream size
>>>>> 
>>>>> 
>>>>> And, as you say, plenty of virtual unix files report size 0.
>>>>> 
>>>>> 
>>>>> 
>>>>>> Now, all kinds of changes are being done image size to work around this.
>>>>> 
>>>>> I would phrase this slightly differently :-)
>>>>> 
>>>>> Some code does the right thing, while other code doesn't.  E.g.:
>>>>> 
>>>>> MultiByteFileStream>>upToEnd is good, while
>>>>> FileStream>>contents is incorrect
>>>>> 
>>>>> 
>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I
>>>>>> am not sure we are doing the right thing here.
>>>>>> 
>>>>>> Point is, I am not sure #next returning nil is official and universal.
>>>>>> 
>>>>>> Consider the comments:
>>>>>> 
>>>>>> Stream>>#next
>>>>>> "Answer the next object accessible by the receiver."
>>>>>> 
>>>>>> ReadStream>>#next
>>>>>> "Primitive. Answer the next object in the Stream represented by the
>>>>>> receiver. Fail if the collection of this stream is not an Array or a
>>>>>> String.
>>>>>> Fail if the stream is positioned at its end, or if the position is out
>>>>>> of
>>>>>> bounds in the collection. Optional. See Object documentation
>>>>>> whatIsAPrimitive."
>>>>>> 
>>>>>> Note how there is no talk about returning nil !
>>>>>> 
>>>>>> I think we should discuss about this first.
>>>>>> 
>>>>>> Was the low level change really correct and the right thing to do ?
>>>>> 
>>>>> The primitive change proposed doesn't affect this discussion.  It will
>>>>> mean that #atEnd returns false (correctly) sometimes, while currently it
>>>>> returns true (incorrectly).  The end result is still incorrect, e.g.
>>>>> #contents returns an empty string for /proc/cpuinfo.
>>>>> 
>>>>> You're correct about no mention of nil, but we have:
>>>>> 
>>>>> FileStream>>next
>>>>> 
>>>>>       (position >= readLimit and: [self atEnd])
>>>>>               ifTrue: [^nil]
>>>>>               ifFalse: [^collection at: (position := position + 1)]
>>>>> 
>>>>> 
>>>>> which has been around for a long time (I suspect, before Pharo existed).
>>>>> 
>>>>> Having said that, I think that raising an exception is a better
>>>>> solution, but it is a much, much bigger change than the one I proposed
>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>>>>> 
>>>>> 
>>>>> Cheers,
>>>>> Alistair
>>>>> 
>>>> 
>>>> Hi,
>>>> yes, if you are after universal behavior englobing Unix streams, the
>>>> Exception might be the best way.
>>>> Because on special stream you can't allways say in advance, you have to 
>>>> try.
>>>> That's the solution adopted by authors of Xtreams.
>>>> But there is a runtime penalty associated to it.
>>>> 
>>>> The penalty once was so high that my proposal to generalize EndOfStream
>>>> usage was rejected a few years ago by AndreaRaab.
>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>>> 
>>> Thanks for this, I'll definitely take a look.
>>> 
>>> Do you have a sense of how Denis' suggestion of using an EndOfStream
>>> object would compare?
>>> 
>>> It would keep the same coding style, but avoid the problems with nil.
>>> 
>>> Thanks,
>>> Alistair
>>> 
>>> 
>>> 
>>>> I have regularly benched Xtreams, but stopped a few years ago.
>>>> Maybe i can excavate and pass on newer VM.
>>>> 
>>>> In the mean time, i had experimented a programmable end of stream behavior
>>>> (via a block, or any other valuable)
>>>> http://www.squeaksource.com/XTream.htm
>>>> so as to reconcile performance and universality, but it was a source of
>>>> complexification at implementation side.
>>>> 
>>>> Nicolas
>>>> 
>>>>> 
>>>>> 
>>>>>> Note also that a Guille introduced something new, #closed which is
>>>>>> related to the difference between having no more elements (maybe right 
>>>>>> now,
>>>>>> like an open network stream) and never ever being able to produce more 
>>>>>> data.
>>>>>> 
>>>>>> Sven
>> 
>> 
>

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to