Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Alistair Grant Tue, 10 Apr 2018 23:33:52 -0700

Hi Sven,

On 10 April 2018 at 23:54, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> Alistair,
>
> I am replying in-between, please don't take my remarks as being negative, I 
> just like to straighten things out as clear as possible.


No problem.


>> On 10 Apr 2018, at 22:14, Alistair Grant <akgrant0...@gmail.com> wrote:
>>
>> Hi Sven,
>>
>> On 10 April 2018 at 19:36, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>> I have trouble understanding your problem analysis, and how your proposed 
>>> solution, would solve it.
>>
>> The discussion started with my proposal to modify the Zinc streams to
>> check the return value of #next for nil rather than using #atEnd to
>> iterate over the underlying stream.  You correctly pointed out that
>> the intention behind #atEnd is that it can be used to control
>> iterating over streams.
>
> OK
>
>>> Where is it being said that #next and/or #atEnd should be blocking or 
>>> non-blocking ?
>>
>> There is existing code that assumes that #atEnd is non-blocking and
>> that #next is allowed block.  I believe that we should keep those
>> conditions.
>
> I fail to see where that is written down, either way. Can you point me to 
> comments stating that, I would really like to know ?

I'm not aware of it being written down, just that ever existing
implementation I'm aware of behaves this way.

On the other hand, making #atEnd blocking breaks Eliot's REPL sample
(in Squeak).



>>> How is this related to how EOF is signalled ?
>>
>> Because, combined with terminal EOF not being known until the user
>> explicitly flags it (with Ctrl-D) it means that #atEnd can't be used
>> for iterating over input from stdin connected to a terminal.
>
> This seems to me like an exception that only holds for one particular stream 
> in one particular scenario (interactive stdin). I might be wrong.
>
>>> It seems to me that there are quite a few classes of streams that are 
>>> 'special' in the sense that #next could be blocking and/or #atEnd could be 
>>> unclear - socket/network streams, serial streams, maybe stdio (interactive 
>>> or not). Without a message like #isDataAvailable you cannot handle those 
>>> without blocking.
>>
>> Right.  I think this is a distraction (I was trying to explain some
>> details, but it's causing more confusion instead of helping).
>>
>> The important point is that #atEnd doesn't work for iterating over
>> streams with terminal input
>
> Maybe you should also point to the actual code that fails. I mean you showed 
> a partial stack trace, but not how you got there, precisely. How does the 
> application reading from an interactive stdin do to get into trouble ?

Included below.


>>> Reading from stdin seems like a very rare case for a Smalltalk system (not 
>>> that it should not be possible).
>>
>> There's been quite a bit of discussion and several projects recently
>> related to using pharo for scripting, so it may become more common.
>> E.g.
>>
>> https://www.quora.com/Can-Smalltalk-be-a-batch-file-scripting-language/answer/Philippe-Back-1?share=c19bfc95
>> https://github.com/rajula96reddy/pharo-cli
>
> Still, it is not common at all.
>
>>> I have a feeling that too much functionality is being pushed into too small 
>>> an API.
>>
>> This is just about how should Zinc streams be iterating over the
>> underlying streams.  You didn't like checking the result of #next for
>> nil since it isn't general, correctly pointing out that nil is a valid
>> value for non-byte oriented streams.  But #atEnd doesn't work for
>> stdin from a terminal.
>>
>>
>> At this point I think there are three options:
>>
>> 1. Modify Zinc to check the return value of #next instead of using #atEnd.
>>
>> This is what all existing character / byte oriented streams in Squeak
>> and Pharo do.  At that point the Zinc streams can be used on all file
>> / stdio input and output.
>
> I agree that such code exists in many places, but there is lots of stream 
> reading that does not check for nils.

Right.  Streams can be categorised in many ways, but for this
discussion I think streams are broken in to two types:

1) Byte / Character oriented
2) All others

For historical reasons, byte / character oriented streams need to
check for EOF by using "stream next == nil" and all other streams
should use #atEnd.

This avoids the "nil being part of the domain" issue that was
discussed earlier in the thread.


>> 2. Modify all streams to signal EOF in some other way, i.e. a sentinel
>> or notification / exception.
>>
>> This is what we were discussing below.  But it is a decent chunk of
>> work with significant impact on the existing code base.
>
> Agreed. This would be a future extension.
>
>> 3. Require anyone who wants to read from stdin to code around Zinc's
>> inability to handle terminal input.
>>
>> I'd prefer to avoid this option if possible.
>
> See higher for a more concrete usage example request.


testAtEnd.st
--
| ch stream string stdin |

'stdio.cs' asFileReference fileIn.
"stdin := FileStream stdin."
stdin := ZnCharacterReadStream on:
    (ZnBufferedReadStream on:
        Stdio stdin).
stream := (String new: 100) writeStream.
ch := stdin next.
[ ch == nil ] whileFalse: [
    stream nextPut: ch.
    ch := stdin next. ].
string := stream contents.
FileStream stdout
    nextPutAll: string; lf;
    nextPutAll: 'Characters read: ';
    nextPutAll: string size asString;
    lf.
Smalltalk snapshot: false andQuit: true.
--

Execute with:

./pharo --headless Pharo7.0-64bit-e76f1a2.image testAtEnd.st

and type Ctrl-D gives:


'Errors in script loaded from testAtEnd.st'
MessageNotUnderstood: receiver of "<" is nil
UndefinedObject(Object)>>doesNotUnderstand: #<
ZnUTF8Encoder>>nextCodePointFromStream:
ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
ZnCharacterReadStream>>nextElement
ZnCharacterReadStream(ZnEncodedReadStream)>>next
UndefinedObject>>DoIt
OpalCompiler>>evaluate


Using #atEnd to control the loop instead of "stdin next == nil"
produces the same result.

Replacing stdin with FileStream stdin makes the script work.

stdio.cs fixes a bug in StdioStream which really isn't part of this
discussion (PR to be submitted).

Cheers,
Alistair




>> Does that clarify the situation?
>
> Yes, it helps. Thanks. But questions remain.
>
>> Thanks,
>> Alistair
>>
>>
>>
>>>> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>>
>>>> First a quick update:
>>>>
>>>> After doing some work on primitiveFileAtEnd, #atEnd now answers
>>>> correctly for files that don't report their size correctly, e.g.
>>>> /dev/urandom and /proc/cpuinfo, whether the files are opened directly or
>>>> redirected through stdin.
>>>>
>>>> However determining whether stdin from a terminal has reached the end of
>>>> file can't be done without making #atEnd blocking since we have to wait
>>>> for the user to flag the end of file, e.g. by typing Ctrl-D.  And #atEnd
>>>> is assumed to be non-blocking.
>>>>
>>>> So currently using ZnCharacterReadStream with stdin from a terminal will
>>>> result in a stack dump similar to:
>>>>
>>>> MessageNotUnderstood: receiver of "<" is nil
>>>> UndefinedObject(Object)>>doesNotUnderstand: #<
>>>> ZnUTF8Encoder>>nextCodePointFromStream:
>>>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream:
>>>> ZnCharacterReadStream>>nextElement
>>>> ZnCharacterReadStream(ZnEncodedReadStream)>>next
>>>> UndefinedObject>>DoIt
>>>>
>>>>
>>>> Going back through the various suggestions that have been made regarding
>>>> using a sentinel object vs. raising a notification / exception, my
>>>> (still to be polished) suggestion is to:
>>>>
>>>> 1. Add an endOfStream instance variable
>>>> 2. When the end of the stream is reached answer the value of the
>>>>  instance variable (i.e. the result of sending #value to the variable).
>>>> 3. The initial default value would be a block that raises a Deprecation
>>>>  warning and then returns nil.  This would allow existing code to
>>>>  function for a changeover period.
>>>> 4. At the end of the deprecation period the default value would be
>>>>  changed to a unique sentinel object which would answer itself as its
>>>>  #value.
>>>>
>>>> At any time users of the stream can set their own sentinel, including a
>>>> block that raises an exception.
>>>>
>>>>
>>>> Cheers,
>>>> Alistair
>>>>
>>>>
>>>> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com> wrote:
>>>>> Thanks for this discussion.
>>>>>
>>>>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> 
>>>>> wrote:
>>>>>> Alistair,
>>>>>>
>>>>>> First off, thanks for the discussions and your contributions, I really 
>>>>>> appreciate them.
>>>>>>
>>>>>> But I want to have a discussion at the high level of the definition and 
>>>>>> semantics of the stream API in Pharo.
>>>>>>
>>>>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>>>>>
>>>>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>>>>>>> Playing a bit devil's advocate, the idea is that, in general,
>>>>>>>>
>>>>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>>>>>>>>
>>>>>>>> is no longer allowed ?
>>>>>>>
>>>>>>> It hasn't been allowed "forever" [1].  It's just been misused for
>>>>>>> almost as long.
>>>>>>>
>>>>>>> [1] Time began when stdio stream support was introduced. :-)
>>>>>>
>>>>>> I am still not convinced. Another way to put it would be that the old 
>>>>>> #atEnd or #upToEnd do not make sense for these streams and some new loop 
>>>>>> is needed, based on a new test (it exists for socket streams already).
>>>>>>
>>>>>> [ stream isDataAvailable ] whileTrue: [ stream next ]
>>>>>>
>>>>>>>> And you want to replace it with
>>>>>>>>
>>>>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue.
>>>>>>>>
>>>>>>>> That is a pretty big change, no ?
>>>>>>>
>>>>>>> That's the way quite a bit of code already operates.
>>>>>>>
>>>>>>> As Denis pointed out, it's obviously problematic in the general sense,
>>>>>>> since nil can be embedded in non-byte oriented streams.  I suspect
>>>>>>> that in practice not many people write code that reads streams from
>>>>>>> both byte oriented and non-byte oriented streams.
>>>>>>
>>>>>> Maybe yes, maybe no. As Denis' example shows there is a clear definition 
>>>>>> problem.
>>>>>>
>>>>>> And I do use streams of byte arrays or strings all the time, this is 
>>>>>> really important. I want my parsers to work on all kinds of streams.
>>>>>>
>>>>>>>> I think/feel like a proper EOF exception would be better, more correct.
>>>>>>>>
>>>>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
>>>>>>>
>>>>>>> I agree, but the email thread Nicolas pointed to raises some
>>>>>>> performance questions about this approach.  It should be
>>>>>>> straightforward to do a basic performance comparison which I'll get
>>>>>>> around to if other objections aren't raised.
>>>>>>
>>>>>> Reading in bigger blocks, using #readInto:startingAt:count: (which is 
>>>>>> basically Unix's (2) Read sys call), would solve performance problems, I 
>>>>>> think.
>>>>>>
>>>>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ?
>>>>>>>
>>>>>>> Unix file i/o returns EOF if the end of file has been reach OR if an
>>>>>>> error occurs.  You should still check #atEnd after reading past the
>>>>>>> end of the file to make sure no error occurred.  Another part of the
>>>>>>> primitive change I'm proposing is to return additional information
>>>>>>> about what went wrong in the event of an error.
>>>>>>
>>>>>> I am sorry, but this kind of semantics (the OR) is way too complex at 
>>>>>> the general image level, it is too specific and based on certain 
>>>>>> underlying implementation details.
>>>>>>
>>>>>> Sven
>>>>>>
>>>>>>> We could modify the read primitive so that it fails if an error has
>>>>>>> occurred, and then #atEnd wouldn't be required.
>>>>>>>
>>>>>>> Cheers,
>>>>>>> Alistair
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>>>>>>>
>>>>>>>>> Hi Nicolas,
>>>>>>>>>
>>>>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier
>>>>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>:
>>>>>>>>>>>
>>>>>>>>>>> Hi Sven,
>>>>>>>>>>>
>>>>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe 
>>>>>>>>>>> wrote:
>>>>>>>>>>>> Somehow, somewhere there was a change to the implementation of the
>>>>>>>>>>>> primitive called by some streams' #atEnd.
>>>>>>>>>>>
>>>>>>>>>>> That's a proposed change by me, but it hasn't been integrated yet.  
>>>>>>>>>>> So
>>>>>>>>>>> the discussion below should apply to the current stable vm (from 
>>>>>>>>>>> August
>>>>>>>>>>> last year).
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero'
>>>>>>>>>>>> and some virtual unix files like /dev/random are zero sized.
>>>>>>>>>>>
>>>>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is
>>>>>>>>>>> effectively defined as:
>>>>>>>>>>>
>>>>>>>>>>> atEnd := stream position >= stream size
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> And, as you say, plenty of virtual unix files report size 0.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Now, all kinds of changes are being done image size to work around 
>>>>>>>>>>>> this.
>>>>>>>>>>>
>>>>>>>>>>> I would phrase this slightly differently :-)
>>>>>>>>>>>
>>>>>>>>>>> Some code does the right thing, while other code doesn't.  E.g.:
>>>>>>>>>>>
>>>>>>>>>>> MultiByteFileStream>>upToEnd is good, while
>>>>>>>>>>> FileStream>>contents is incorrect
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, 
>>>>>>>>>>>> but I
>>>>>>>>>>>> am not sure we are doing the right thing here.
>>>>>>>>>>>>
>>>>>>>>>>>> Point is, I am not sure #next returning nil is official and 
>>>>>>>>>>>> universal.
>>>>>>>>>>>>
>>>>>>>>>>>> Consider the comments:
>>>>>>>>>>>>
>>>>>>>>>>>> Stream>>#next
>>>>>>>>>>>> "Answer the next object accessible by the receiver."
>>>>>>>>>>>>
>>>>>>>>>>>> ReadStream>>#next
>>>>>>>>>>>> "Primitive. Answer the next object in the Stream represented by the
>>>>>>>>>>>> receiver. Fail if the collection of this stream is not an Array or 
>>>>>>>>>>>> a
>>>>>>>>>>>> String.
>>>>>>>>>>>> Fail if the stream is positioned at its end, or if the position is 
>>>>>>>>>>>> out
>>>>>>>>>>>> of
>>>>>>>>>>>> bounds in the collection. Optional. See Object documentation
>>>>>>>>>>>> whatIsAPrimitive."
>>>>>>>>>>>>
>>>>>>>>>>>> Note how there is no talk about returning nil !
>>>>>>>>>>>>
>>>>>>>>>>>> I think we should discuss about this first.
>>>>>>>>>>>>
>>>>>>>>>>>> Was the low level change really correct and the right thing to do ?
>>>>>>>>>>>
>>>>>>>>>>> The primitive change proposed doesn't affect this discussion.  It 
>>>>>>>>>>> will
>>>>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while 
>>>>>>>>>>> currently it
>>>>>>>>>>> returns true (incorrectly).  The end result is still incorrect, e.g.
>>>>>>>>>>> #contents returns an empty string for /proc/cpuinfo.
>>>>>>>>>>>
>>>>>>>>>>> You're correct about no mention of nil, but we have:
>>>>>>>>>>>
>>>>>>>>>>> FileStream>>next
>>>>>>>>>>>
>>>>>>>>>>>     (position >= readLimit and: [self atEnd])
>>>>>>>>>>>             ifTrue: [^nil]
>>>>>>>>>>>             ifFalse: [^collection at: (position := position + 1)]
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> which has been around for a long time (I suspect, before Pharo 
>>>>>>>>>>> existed).
>>>>>>>>>>>
>>>>>>>>>>> Having said that, I think that raising an exception is a better
>>>>>>>>>>> solution, but it is a much, much bigger change than the one I 
>>>>>>>>>>> proposed
>>>>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Cheers,
>>>>>>>>>>> Alistair
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Hi,
>>>>>>>>>> yes, if you are after universal behavior englobing Unix streams, the
>>>>>>>>>> Exception might be the best way.
>>>>>>>>>> Because on special stream you can't allways say in advance, you have 
>>>>>>>>>> to try.
>>>>>>>>>> That's the solution adopted by authors of Xtreams.
>>>>>>>>>> But there is a runtime penalty associated to it.
>>>>>>>>>>
>>>>>>>>>> The penalty once was so high that my proposal to generalize 
>>>>>>>>>> EndOfStream
>>>>>>>>>> usage was rejected a few years ago by AndreaRaab.
>>>>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>>>>>>>>>
>>>>>>>>> Thanks for this, I'll definitely take a look.
>>>>>>>>>
>>>>>>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream
>>>>>>>>> object would compare?
>>>>>>>>>
>>>>>>>>> It would keep the same coding style, but avoid the problems with nil.
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Alistair
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>> I have regularly benched Xtreams, but stopped a few years ago.
>>>>>>>>>> Maybe i can excavate and pass on newer VM.
>>>>>>>>>>
>>>>>>>>>> In the mean time, i had experimented a programmable end of stream 
>>>>>>>>>> behavior
>>>>>>>>>> (via a block, or any other valuable)
>>>>>>>>>> http://www.squeaksource.com/XTream.htm
>>>>>>>>>> so as to reconcile performance and universality, but it was a source 
>>>>>>>>>> of
>>>>>>>>>> complexification at implementation side.
>>>>>>>>>>
>>>>>>>>>> Nicolas
>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> Note also that a Guille introduced something new, #closed which is
>>>>>>>>>>>> related to the difference between having no more elements (maybe 
>>>>>>>>>>>> right now,
>>>>>>>>>>>> like an open network stream) and never ever being able to produce 
>>>>>>>>>>>> more data.
>>>>>>>>>>>>
>>>>>>>>>>>> Sven
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>>
>> <stdio.cs>
>
>

'From Pharo7.0alpha of 5 April 2018 [Build information: Pharo-7.0+alpha.build.756.sha.de40aebd759f4b81a41094992741c67e031f6b3f (64 Bit)] on 9 April 2018 at 8:59:31.566446 am'!

!AbstractBinaryFileStream methodsFor: 'private' stamp: 'AlistairGrant 4/9/2018 08:53'!
primAtEnd: id
	"Answer true if the file position is at the end of the file."

	<primitive: 'primitiveFileAtEnd' module: 'FilePlugin'>
	self primitiveFailed
! !

!AbstractBinaryFileStream methodsFor: 'testing' stamp: 'AlistairGrant 4/9/2018 08:53'!
atEnd

	^ self primAtEnd: handle! !

StdioStream removeSelector: #atEnd!

!StdioStream reorganize!
(#positioning position)
(#accessing peek next:)
!

BinaryFileStream removeSelector: #atEnd!
BinaryFileStream removeSelector: #primAtEnd:!

!BinaryFileStream reorganize!
(#positioning setToEnd position truncate: skip: truncate position: reset)
(#testing closed)
(#flushing sync flush)
(#finalization finalize register unregister)
(#'open/close' close)
(#private primGetPosition: primSize: primSizeNoError: primCloseNoError: primSetPosition:to:)
(#accessing peek size upToEnd)
!

Re: [Pharo-dev] Changed #atEnd primitive - #atEnd vs #next returning nil

Reply via email to