Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Alistair Grant Tue, 10 Apr 2018 09:31:33 -0700

First a quick update:

After doing some work on primitiveFileAtEnd, #atEnd now answers
correctly for files that don't report their size correctly, e.g.
/dev/urandom and /proc/cpuinfo, whether the files are opened directly or
redirected through stdin.


However determining whether stdin from a terminal has reached the end of
file can't be done without making #atEnd blocking since we have to wait
for the user to flag the end of file, e.g. by typing Ctrl-D.  And #atEnd
is assumed to be non-blocking.

So currently using ZnCharacterReadStream with stdin from a terminal will
result in a stack dump similar to:

MessageNotUnderstood: receiver of "<" is nil
UndefinedObject(Object)>>doesNotUnderstand: #<
ZnUTF8Encoder>>nextCodePointFromStream:
ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
ZnCharacterReadStream>>nextElement
ZnCharacterReadStream(ZnEncodedReadStream)>>next
UndefinedObject>>DoIt


Going back through the various suggestions that have been made regarding
using a sentinel object vs. raising a notification / exception, my
(still to be polished) suggestion is to:

1. Add an endOfStream instance variable
2. When the end of the stream is reached answer the value of the
   instance variable (i.e. the result of sending #value to the variable).
3. The initial default value would be a block that raises a Deprecation
   warning and then returns nil.  This would allow existing code to
   function for a changeover period.
4. At the end of the deprecation period the default value would be
   changed to a unique sentinel object which would answer itself as its
   #value.

At any time users of the stream can set their own sentinel, including a
block that raises an exception.


Cheers,
Alistair


On 4 April 2018 at 19:24, Stephane Ducasse <[email protected]> wrote:
> Thanks for this discussion.
>
> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <[email protected]> wrote:
>> Alistair,
>>
>> First off, thanks for the discussions and your contributions, I really 
>> appreciate them.
>>
>> But I want to have a discussion at the high level of the definition and 
>> semantics of the stream API in Pharo.
>>
>>> On 4 Apr 2018, at 13:20, Alistair Grant <[email protected]> wrote:
>>>
>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <[email protected]> wrote:
>>>> Playing a bit devil's advocate, the idea is that, in general,
>>>>
>>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>>>>
>>>> is no longer allowed ?
>>>
>>> It hasn't been allowed "forever" [1].  It's just been misused for
>>> almost as long.
>>>
>>> [1] Time began when stdio stream support was introduced. :-)
>>
>> I am still not convinced. Another way to put it would be that the old #atEnd 
>> or #upToEnd do not make sense for these streams and some new loop is needed, 
>> based on a new test (it exists for socket streams already).
>>
>> [ stream isDataAvailable ] whileTrue: [ stream next ]
>>
>>>> And you want to replace it with
>>>>
>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue.
>>>>
>>>> That is a pretty big change, no ?
>>>
>>> That's the way quite a bit of code already operates.
>>>
>>> As Denis pointed out, it's obviously problematic in the general sense,
>>> since nil can be embedded in non-byte oriented streams.  I suspect
>>> that in practice not many people write code that reads streams from
>>> both byte oriented and non-byte oriented streams.
>>
>> Maybe yes, maybe no. As Denis' example shows there is a clear definition 
>> problem.
>>
>> And I do use streams of byte arrays or strings all the time, this is really 
>> important. I want my parsers to work on all kinds of streams.
>>
>>>> I think/feel like a proper EOF exception would be better, more correct.
>>>>
>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
>>>
>>> I agree, but the email thread Nicolas pointed to raises some
>>> performance questions about this approach.  It should be
>>> straightforward to do a basic performance comparison which I'll get
>>> around to if other objections aren't raised.
>>
>> Reading in bigger blocks, using #readInto:startingAt:count: (which is 
>> basically Unix's (2) Read sys call), would solve performance problems, I 
>> think.
>>
>>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ?
>>>
>>> Unix file i/o returns EOF if the end of file has been reach OR if an
>>> error occurs.  You should still check #atEnd after reading past the
>>> end of the file to make sure no error occurred.  Another part of the
>>> primitive change I'm proposing is to return additional information
>>> about what went wrong in the event of an error.
>>
>> I am sorry, but this kind of semantics (the OR) is way too complex at the 
>> general image level, it is too specific and based on certain underlying 
>> implementation details.
>>
>> Sven
>>
>>> We could modify the read primitive so that it fails if an error has
>>> occurred, and then #atEnd wouldn't be required.
>>>
>>> Cheers,
>>> Alistair
>>>
>>>
>>>
>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <[email protected]> wrote:
>>>>>
>>>>> Hi Nicolas,
>>>>>
>>>>> On 4 April 2018 at 12:36, Nicolas Cellier
>>>>> <[email protected]> wrote:
>>>>>>
>>>>>>
>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <[email protected]>:
>>>>>>>
>>>>>>> Hi Sven,
>>>>>>>
>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote:
>>>>>>>> Somehow, somewhere there was a change to the implementation of the
>>>>>>>> primitive called by some streams' #atEnd.
>>>>>>>
>>>>>>> That's a proposed change by me, but it hasn't been integrated yet.  So
>>>>>>> the discussion below should apply to the current stable vm (from August
>>>>>>> last year).
>>>>>>>
>>>>>>>
>>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero'
>>>>>>>> and some virtual unix files like /dev/random are zero sized.
>>>>>>>
>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is
>>>>>>> effectively defined as:
>>>>>>>
>>>>>>> atEnd := stream position >= stream size
>>>>>>>
>>>>>>>
>>>>>>> And, as you say, plenty of virtual unix files report size 0.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Now, all kinds of changes are being done image size to work around 
>>>>>>>> this.
>>>>>>>
>>>>>>> I would phrase this slightly differently :-)
>>>>>>>
>>>>>>> Some code does the right thing, while other code doesn't.  E.g.:
>>>>>>>
>>>>>>> MultiByteFileStream>>upToEnd is good, while
>>>>>>> FileStream>>contents is incorrect
>>>>>>>
>>>>>>>
>>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I
>>>>>>>> am not sure we are doing the right thing here.
>>>>>>>>
>>>>>>>> Point is, I am not sure #next returning nil is official and universal.
>>>>>>>>
>>>>>>>> Consider the comments:
>>>>>>>>
>>>>>>>> Stream>>#next
>>>>>>>> "Answer the next object accessible by the receiver."
>>>>>>>>
>>>>>>>> ReadStream>>#next
>>>>>>>> "Primitive. Answer the next object in the Stream represented by the
>>>>>>>> receiver. Fail if the collection of this stream is not an Array or a
>>>>>>>> String.
>>>>>>>> Fail if the stream is positioned at its end, or if the position is out
>>>>>>>> of
>>>>>>>> bounds in the collection. Optional. See Object documentation
>>>>>>>> whatIsAPrimitive."
>>>>>>>>
>>>>>>>> Note how there is no talk about returning nil !
>>>>>>>>
>>>>>>>> I think we should discuss about this first.
>>>>>>>>
>>>>>>>> Was the low level change really correct and the right thing to do ?
>>>>>>>
>>>>>>> The primitive change proposed doesn't affect this discussion.  It will
>>>>>>> mean that #atEnd returns false (correctly) sometimes, while currently it
>>>>>>> returns true (incorrectly).  The end result is still incorrect, e.g.
>>>>>>> #contents returns an empty string for /proc/cpuinfo.
>>>>>>>
>>>>>>> You're correct about no mention of nil, but we have:
>>>>>>>
>>>>>>> FileStream>>next
>>>>>>>
>>>>>>>       (position >= readLimit and: [self atEnd])
>>>>>>>               ifTrue: [^nil]
>>>>>>>               ifFalse: [^collection at: (position := position + 1)]
>>>>>>>
>>>>>>>
>>>>>>> which has been around for a long time (I suspect, before Pharo existed).
>>>>>>>
>>>>>>> Having said that, I think that raising an exception is a better
>>>>>>> solution, but it is a much, much bigger change than the one I proposed
>>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>>>>>>>
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Alistair
>>>>>>>
>>>>>>
>>>>>> Hi,
>>>>>> yes, if you are after universal behavior englobing Unix streams, the
>>>>>> Exception might be the best way.
>>>>>> Because on special stream you can't allways say in advance, you have to 
>>>>>> try.
>>>>>> That's the solution adopted by authors of Xtreams.
>>>>>> But there is a runtime penalty associated to it.
>>>>>>
>>>>>> The penalty once was so high that my proposal to generalize EndOfStream
>>>>>> usage was rejected a few years ago by AndreaRaab.
>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>>>>>
>>>>> Thanks for this, I'll definitely take a look.
>>>>>
>>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream
>>>>> object would compare?
>>>>>
>>>>> It would keep the same coding style, but avoid the problems with nil.
>>>>>
>>>>> Thanks,
>>>>> Alistair
>>>>>
>>>>>
>>>>>
>>>>>> I have regularly benched Xtreams, but stopped a few years ago.
>>>>>> Maybe i can excavate and pass on newer VM.
>>>>>>
>>>>>> In the mean time, i had experimented a programmable end of stream 
>>>>>> behavior
>>>>>> (via a block, or any other valuable)
>>>>>> http://www.squeaksource.com/XTream.htm
>>>>>> so as to reconcile performance and universality, but it was a source of
>>>>>> complexification at implementation side.
>>>>>>
>>>>>> Nicolas
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Note also that a Guille introduced something new, #closed which is
>>>>>>>> related to the difference between having no more elements (maybe right 
>>>>>>>> now,
>>>>>>>> like an open network stream) and never ever being able to produce more 
>>>>>>>> data.
>>>>>>>>
>>>>>>>> Sven
>>>>
>>>>
>>>
>>
>>
>

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to