Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Alistair Grant Wed, 11 Apr 2018 01:30:45 -0700

Hi Denis,

On 11 April 2018 at 10:02, Denis Kudriashov <dionisi...@gmail.com> wrote:
>
> 2018-04-11 8:32 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>:
>>
>> >>> Where is it being said that #next and/or #atEnd should be blocking or
>> >>> non-blocking ?
>> >>
>> >> There is existing code that assumes that #atEnd is non-blocking and
>> >> that #next is allowed block.  I believe that we should keep those
>> >> conditions.
>> >
>> > I fail to see where that is written down, either way. Can you point me
>> > to comments stating that, I would really like to know ?
>>
>> I'm not aware of it being written down, just that ever existing
>> implementation I'm aware of behaves this way.
>>
>> On the other hand, making #atEnd blocking breaks Eliot's REPL sample
>> (in Squeak).
>
>
> Could you write here this example, please?


The code is loaded in squeak using:

https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/image/buildspurtrunkreaderimage.sh

for 32 bit images.  It loads:

https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/image/LoadReader.st

which loads package CogTools-Listener in http://source.squeak.org/VMMaker

An image that automatically runs the code and nothing else is created in:

https://github.com/OpenSmalltalk/opensmalltalk-vm/blob/Cog/image/StartReader.st


If you want to run it interactively you can load CogTools-Listener and
do something like:

StdioListener new
    quitOnEof: false;
    run

If you modify #atEnd to block it will result in the "squeak>" input
prompt being printed in the terminal after the input has been entered.

The code can be loaded in to Pharo and basically works, but the output
tends to be hidden behind the next input prompt because it uses #cr
instead of #lf.  You can easily modify StdioListener>>initialize to
set the line end convention in stdout.

NOTE: It is not intended to be a release quality implementation of a
evaluation loop.  The whole purpose as I understand it is for it to be
as simple as possible to assist in tracking down issues using the VM
simulator.  It runs minimal code to get to the point of waiting for
user input and then allows an expression that causes problems to be
entered and traced using the simulator.

Cheers,
Alistair



>> >>> How is this related to how EOF is signalled ?
>> >>
>> >> Because, combined with terminal EOF not being known until the user
>> >> explicitly flags it (with Ctrl-D) it means that #atEnd can't be used
>> >> for iterating over input from stdin connected to a terminal.
>> >
>> > This seems to me like an exception that only holds for one particular
>> > stream in one particular scenario (interactive stdin). I might be wrong.
>> >
>> >>> It seems to me that there are quite a few classes of streams that are
>> >>> 'special' in the sense that #next could be blocking and/or #atEnd could 
>> >>> be
>> >>> unclear - socket/network streams, serial streams, maybe stdio 
>> >>> (interactive
>> >>> or not). Without a message like #isDataAvailable you cannot handle those
>> >>> without blocking.
>> >>
>> >> Right.  I think this is a distraction (I was trying to explain some
>> >> details, but it's causing more confusion instead of helping).
>> >>
>> >> The important point is that #atEnd doesn't work for iterating over
>> >> streams with terminal input
>> >
>> > Maybe you should also point to the actual code that fails. I mean you
>> > showed a partial stack trace, but not how you got there, precisely. How 
>> > does
>> > the application reading from an interactive stdin do to get into trouble ?
>>
>> Included below.
>>
>>
>> >>> Reading from stdin seems like a very rare case for a Smalltalk system
>> >>> (not that it should not be possible).
>> >>
>> >> There's been quite a bit of discussion and several projects recently
>> >> related to using pharo for scripting, so it may become more common.
>> >> E.g.
>> >>
>> >>
>> >> https://www.quora.com/Can-Smalltalk-be-a-batch-file-scripting-language/answer/Philippe-Back-1?share=c19bfc95
>> >> https://github.com/rajula96reddy/pharo-cli
>> >
>> > Still, it is not common at all.
>> >
>> >>> I have a feeling that too much functionality is being pushed into too
>> >>> small an API.
>> >>
>> >> This is just about how should Zinc streams be iterating over the
>> >> underlying streams.  You didn't like checking the result of #next for
>> >> nil since it isn't general, correctly pointing out that nil is a valid
>> >> value for non-byte oriented streams.  But #atEnd doesn't work for
>> >> stdin from a terminal.
>> >>
>> >>
>> >> At this point I think there are three options:
>> >>
>> >> 1. Modify Zinc to check the return value of #next instead of using
>> >> #atEnd.
>> >>
>> >> This is what all existing character / byte oriented streams in Squeak
>> >> and Pharo do.  At that point the Zinc streams can be used on all file
>> >> / stdio input and output.
>> >
>> > I agree that such code exists in many places, but there is lots of
>> > stream reading that does not check for nils.
>>
>> Right.  Streams can be categorised in many ways, but for this
>> discussion I think streams are broken in to two types:
>>
>> 1) Byte / Character oriented
>> 2) All others
>>
>> For historical reasons, byte / character oriented streams need to
>> check for EOF by using "stream next == nil" and all other streams
>> should use #atEnd.
>>
>> This avoids the "nil being part of the domain" issue that was
>> discussed earlier in the thread.
>>
>>
>> >> 2. Modify all streams to signal EOF in some other way, i.e. a sentinel
>> >> or notification / exception.
>> >>
>> >> This is what we were discussing below.  But it is a decent chunk of
>> >> work with significant impact on the existing code base.
>> >
>> > Agreed. This would be a future extension.
>> >
>> >> 3. Require anyone who wants to read from stdin to code around Zinc's
>> >> inability to handle terminal input.
>> >>
>> >> I'd prefer to avoid this option if possible.
>> >
>> > See higher for a more concrete usage example request.
>>
>>
>> testAtEnd.st
>> --
>> | ch stream string stdin |
>>
>> 'stdio.cs' asFileReference fileIn.
>> "stdin := FileStream stdin."
>> stdin := ZnCharacterReadStream on:
>>     (ZnBufferedReadStream on:
>>         Stdio stdin).
>> stream := (String new: 100) writeStream.
>> ch := stdin next.
>> [ ch == nil ] whileFalse: [
>>     stream nextPut: ch.
>>     ch := stdin next. ].
>> string := stream contents.
>> FileStream stdout
>>     nextPutAll: string; lf;
>>     nextPutAll: 'Characters read: ';
>>     nextPutAll: string size asString;
>>     lf.
>> Smalltalk snapshot: false andQuit: true.
>> --
>>
>> Execute with:
>>
>> ./pharo --headless Pharo7.0-64bit-e76f1a2.image testAtEnd.st
>>
>> and type Ctrl-D gives:
>>
>>
>> 'Errors in script loaded from testAtEnd.st'
>> MessageNotUnderstood: receiver of "<" is nil
>> UndefinedObject(Object)>>doesNotUnderstand: #<
>> ZnUTF8Encoder>>nextCodePointFromStream:
>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
>> ZnCharacterReadStream>>nextElement
>> ZnCharacterReadStream(ZnEncodedReadStream)>>next
>> UndefinedObject>>DoIt
>> OpalCompiler>>evaluate
>>
>>
>> Using #atEnd to control the loop instead of "stdin next == nil"
>> produces the same result.
>>
>> Replacing stdin with FileStream stdin makes the script work.
>>
>> stdio.cs fixes a bug in StdioStream which really isn't part of this
>> discussion (PR to be submitted).
>>
>> Cheers,
>> Alistair
>>
>>
>>
>>
>> >> Does that clarify the situation?
>> >
>> > Yes, it helps. Thanks. But questions remain.
>> >
>> >> Thanks,
>> >> Alistair
>> >>
>> >>
>> >>
>> >>>> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com>
>> >>>> wrote:
>> >>>>
>> >>>> First a quick update:
>> >>>>
>> >>>> After doing some work on primitiveFileAtEnd, #atEnd now answers
>> >>>> correctly for files that don't report their size correctly, e.g.
>> >>>> /dev/urandom and /proc/cpuinfo, whether the files are opened directly
>> >>>> or
>> >>>> redirected through stdin.
>> >>>>
>> >>>> However determining whether stdin from a terminal has reached the end
>> >>>> of
>> >>>> file can't be done without making #atEnd blocking since we have to
>> >>>> wait
>> >>>> for the user to flag the end of file, e.g. by typing Ctrl-D.  And
>> >>>> #atEnd
>> >>>> is assumed to be non-blocking.
>> >>>>
>> >>>> So currently using ZnCharacterReadStream with stdin from a terminal
>> >>>> will
>> >>>> result in a stack dump similar to:
>> >>>>
>> >>>> MessageNotUnderstood: receiver of "<" is nil
>> >>>> UndefinedObject(Object)>>doesNotUnderstand: #<
>> >>>> ZnUTF8Encoder>>nextCodePointFromStream:
>> >>>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
>> >>>> ZnCharacterReadStream>>nextElement
>> >>>> ZnCharacterReadStream(ZnEncodedReadStream)>>next
>> >>>> UndefinedObject>>DoIt
>> >>>>
>> >>>>
>> >>>> Going back through the various suggestions that have been made
>> >>>> regarding
>> >>>> using a sentinel object vs. raising a notification / exception, my
>> >>>> (still to be polished) suggestion is to:
>> >>>>
>> >>>> 1. Add an endOfStream instance variable
>> >>>> 2. When the end of the stream is reached answer the value of the
>> >>>>  instance variable (i.e. the result of sending #value to the
>> >>>> variable).
>> >>>> 3. The initial default value would be a block that raises a
>> >>>> Deprecation
>> >>>>  warning and then returns nil.  This would allow existing code to
>> >>>>  function for a changeover period.
>> >>>> 4. At the end of the deprecation period the default value would be
>> >>>>  changed to a unique sentinel object which would answer itself as its
>> >>>>  #value.
>> >>>>
>> >>>> At any time users of the stream can set their own sentinel, including
>> >>>> a
>> >>>> block that raises an exception.
>> >>>>
>> >>>>
>> >>>> Cheers,
>> >>>> Alistair
>> >>>>
>> >>>>
>> >>>> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com>
>> >>>> wrote:
>> >>>>> Thanks for this discussion.
>> >>>>>
>> >>>>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu>
>> >>>>> wrote:
>> >>>>>> Alistair,
>> >>>>>>
>> >>>>>> First off, thanks for the discussions and your contributions, I
>> >>>>>> really appreciate them.
>> >>>>>>
>> >>>>>> But I want to have a discussion at the high level of the definition
>> >>>>>> and semantics of the stream API in Pharo.
>> >>>>>>
>> >>>>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com>
>> >>>>>>> wrote:
>> >>>>>>>
>> >>>>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu>
>> >>>>>>> wrote:
>> >>>>>>>> Playing a bit devil's advocate, the idea is that, in general,
>> >>>>>>>>
>> >>>>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>> >>>>>>>>
>> >>>>>>>> is no longer allowed ?
>> >>>>>>>
>> >>>>>>> It hasn't been allowed "forever" [1].  It's just been misused for
>> >>>>>>> almost as long.
>> >>>>>>>
>> >>>>>>> [1] Time began when stdio stream support was introduced. :-)
>> >>>>>>
>> >>>>>> I am still not convinced. Another way to put it would be that the
>> >>>>>> old #atEnd or #upToEnd do not make sense for these streams and some 
>> >>>>>> new loop
>> >>>>>> is needed, based on a new test (it exists for socket streams already).
>> >>>>>>
>> >>>>>> [ stream isDataAvailable ] whileTrue: [ stream next ]
>> >>>>>>
>> >>>>>>>> And you want to replace it with
>> >>>>>>>>
>> >>>>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ]
>> >>>>>>>> whileTrue.
>> >>>>>>>>
>> >>>>>>>> That is a pretty big change, no ?
>> >>>>>>>
>> >>>>>>> That's the way quite a bit of code already operates.
>> >>>>>>>
>> >>>>>>> As Denis pointed out, it's obviously problematic in the general
>> >>>>>>> sense,
>> >>>>>>> since nil can be embedded in non-byte oriented streams.  I suspect
>> >>>>>>> that in practice not many people write code that reads streams
>> >>>>>>> from
>> >>>>>>> both byte oriented and non-byte oriented streams.
>> >>>>>>
>> >>>>>> Maybe yes, maybe no. As Denis' example shows there is a clear
>> >>>>>> definition problem.
>> >>>>>>
>> >>>>>> And I do use streams of byte arrays or strings all the time, this
>> >>>>>> is really important. I want my parsers to work on all kinds of 
>> >>>>>> streams.
>> >>>>>>
>> >>>>>>>> I think/feel like a proper EOF exception would be better, more
>> >>>>>>>> correct.
>> >>>>>>>>
>> >>>>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
>> >>>>>>>
>> >>>>>>> I agree, but the email thread Nicolas pointed to raises some
>> >>>>>>> performance questions about this approach.  It should be
>> >>>>>>> straightforward to do a basic performance comparison which I'll
>> >>>>>>> get
>> >>>>>>> around to if other objections aren't raised.
>> >>>>>>
>> >>>>>> Reading in bigger blocks, using #readInto:startingAt:count: (which
>> >>>>>> is basically Unix's (2) Read sys call), would solve performance 
>> >>>>>> problems, I
>> >>>>>> think.
>> >>>>>>
>> >>>>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use
>> >>>>>>>> it ?
>> >>>>>>>
>> >>>>>>> Unix file i/o returns EOF if the end of file has been reach OR if
>> >>>>>>> an
>> >>>>>>> error occurs.  You should still check #atEnd after reading past
>> >>>>>>> the
>> >>>>>>> end of the file to make sure no error occurred.  Another part of
>> >>>>>>> the
>> >>>>>>> primitive change I'm proposing is to return additional information
>> >>>>>>> about what went wrong in the event of an error.
>> >>>>>>
>> >>>>>> I am sorry, but this kind of semantics (the OR) is way too complex
>> >>>>>> at the general image level, it is too specific and based on certain
>> >>>>>> underlying implementation details.
>> >>>>>>
>> >>>>>> Sven
>> >>>>>>
>> >>>>>>> We could modify the read primitive so that it fails if an error
>> >>>>>>> has
>> >>>>>>> occurred, and then #atEnd wouldn't be required.
>> >>>>>>>
>> >>>>>>> Cheers,
>> >>>>>>> Alistair
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>
>> >>>>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com>
>> >>>>>>>>> wrote:
>> >>>>>>>>>
>> >>>>>>>>> Hi Nicolas,
>> >>>>>>>>>
>> >>>>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier
>> >>>>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant
>> >>>>>>>>>> <akgrant0...@gmail.com>:
>> >>>>>>>>>>>
>> >>>>>>>>>>> Hi Sven,
>> >>>>>>>>>>>
>> >>>>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van
>> >>>>>>>>>>> Caekenberghe wrote:
>> >>>>>>>>>>>> Somehow, somewhere there was a change to the implementation
>> >>>>>>>>>>>> of the
>> >>>>>>>>>>>> primitive called by some streams' #atEnd.
>> >>>>>>>>>>>
>> >>>>>>>>>>> That's a proposed change by me, but it hasn't been integrated
>> >>>>>>>>>>> yet.  So
>> >>>>>>>>>>> the discussion below should apply to the current stable vm
>> >>>>>>>>>>> (from August
>> >>>>>>>>>>> last year).
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being
>> >>>>>>>>>>>> zero'
>> >>>>>>>>>>>> and some virtual unix files like /dev/random are zero sized.
>> >>>>>>>>>>>
>> >>>>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin)
>> >>>>>>>>>>> it is
>> >>>>>>>>>>> effectively defined as:
>> >>>>>>>>>>>
>> >>>>>>>>>>> atEnd := stream position >= stream size
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> And, as you say, plenty of virtual unix files report size 0.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Now, all kinds of changes are being done image size to work
>> >>>>>>>>>>>> around this.
>> >>>>>>>>>>>
>> >>>>>>>>>>> I would phrase this slightly differently :-)
>> >>>>>>>>>>>
>> >>>>>>>>>>> Some code does the right thing, while other code doesn't.
>> >>>>>>>>>>> E.g.:
>> >>>>>>>>>>>
>> >>>>>>>>>>> MultiByteFileStream>>upToEnd is good, while
>> >>>>>>>>>>> FileStream>>contents is incorrect
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> I am a strong believer in simple, real (i.e. infinite)
>> >>>>>>>>>>>> streams, but I
>> >>>>>>>>>>>> am not sure we are doing the right thing here.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Point is, I am not sure #next returning nil is official and
>> >>>>>>>>>>>> universal.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Consider the comments:
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Stream>>#next
>> >>>>>>>>>>>> "Answer the next object accessible by the receiver."
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> ReadStream>>#next
>> >>>>>>>>>>>> "Primitive. Answer the next object in the Stream represented
>> >>>>>>>>>>>> by the
>> >>>>>>>>>>>> receiver. Fail if the collection of this stream is not an
>> >>>>>>>>>>>> Array or a
>> >>>>>>>>>>>> String.
>> >>>>>>>>>>>> Fail if the stream is positioned at its end, or if the
>> >>>>>>>>>>>> position is out
>> >>>>>>>>>>>> of
>> >>>>>>>>>>>> bounds in the collection. Optional. See Object documentation
>> >>>>>>>>>>>> whatIsAPrimitive."
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Note how there is no talk about returning nil !
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> I think we should discuss about this first.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Was the low level change really correct and the right thing
>> >>>>>>>>>>>> to do ?
>> >>>>>>>>>>>
>> >>>>>>>>>>> The primitive change proposed doesn't affect this discussion.
>> >>>>>>>>>>> It will
>> >>>>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while
>> >>>>>>>>>>> currently it
>> >>>>>>>>>>> returns true (incorrectly).  The end result is still
>> >>>>>>>>>>> incorrect, e.g.
>> >>>>>>>>>>> #contents returns an empty string for /proc/cpuinfo.
>> >>>>>>>>>>>
>> >>>>>>>>>>> You're correct about no mention of nil, but we have:
>> >>>>>>>>>>>
>> >>>>>>>>>>> FileStream>>next
>> >>>>>>>>>>>
>> >>>>>>>>>>>     (position >= readLimit and: [self atEnd])
>> >>>>>>>>>>>             ifTrue: [^nil]
>> >>>>>>>>>>>             ifFalse: [^collection at: (position := position +
>> >>>>>>>>>>> 1)]
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> which has been around for a long time (I suspect, before Pharo
>> >>>>>>>>>>> existed).
>> >>>>>>>>>>>
>> >>>>>>>>>>> Having said that, I think that raising an exception is a
>> >>>>>>>>>>> better
>> >>>>>>>>>>> solution, but it is a much, much bigger change than the one I
>> >>>>>>>>>>> proposed
>> >>>>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Cheers,
>> >>>>>>>>>>> Alistair
>> >>>>>>>>>>>
>> >>>>>>>>>>
>> >>>>>>>>>> Hi,
>> >>>>>>>>>> yes, if you are after universal behavior englobing Unix
>> >>>>>>>>>> streams, the
>> >>>>>>>>>> Exception might be the best way.
>> >>>>>>>>>> Because on special stream you can't allways say in advance, you
>> >>>>>>>>>> have to try.
>> >>>>>>>>>> That's the solution adopted by authors of Xtreams.
>> >>>>>>>>>> But there is a runtime penalty associated to it.
>> >>>>>>>>>>
>> >>>>>>>>>> The penalty once was so high that my proposal to generalize
>> >>>>>>>>>> EndOfStream
>> >>>>>>>>>> usage was rejected a few years ago by AndreaRaab.
>> >>>>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>> >>>>>>>>>
>> >>>>>>>>> Thanks for this, I'll definitely take a look.
>> >>>>>>>>>
>> >>>>>>>>> Do you have a sense of how Denis' suggestion of using an
>> >>>>>>>>> EndOfStream
>> >>>>>>>>> object would compare?
>> >>>>>>>>>
>> >>>>>>>>> It would keep the same coding style, but avoid the problems with
>> >>>>>>>>> nil.
>> >>>>>>>>>
>> >>>>>>>>> Thanks,
>> >>>>>>>>> Alistair
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>
>> >>>>>>>>>> I have regularly benched Xtreams, but stopped a few years ago.
>> >>>>>>>>>> Maybe i can excavate and pass on newer VM.
>> >>>>>>>>>>
>> >>>>>>>>>> In the mean time, i had experimented a programmable end of
>> >>>>>>>>>> stream behavior
>> >>>>>>>>>> (via a block, or any other valuable)
>> >>>>>>>>>> http://www.squeaksource.com/XTream.htm
>> >>>>>>>>>> so as to reconcile performance and universality, but it was a
>> >>>>>>>>>> source of
>> >>>>>>>>>> complexification at implementation side.
>> >>>>>>>>>>
>> >>>>>>>>>> Nicolas
>> >>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>>> Note also that a Guille introduced something new, #closed
>> >>>>>>>>>>>> which is
>> >>>>>>>>>>>> related to the difference between having no more elements
>> >>>>>>>>>>>> (maybe right now,
>> >>>>>>>>>>>> like an open network stream) and never ever being able to
>> >>>>>>>>>>>> produce more data.
>> >>>>>>>>>>>>
>> >>>>>>>>>>>> Sven
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>>
>> >> <stdio.cs>
>> >
>> >
>
>

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to