Hi Sven, On 10 April 2018 at 23:54, Sven Van Caekenberghe <s...@stfx.eu> wrote: > Alistair, > > I am replying in-between, please don't take my remarks as being negative, I > just like to straighten things out as clear as possible.
No problem. >> On 10 Apr 2018, at 22:14, Alistair Grant <akgrant0...@gmail.com> wrote: >> >> Hi Sven, >> >> On 10 April 2018 at 19:36, Sven Van Caekenberghe <s...@stfx.eu> wrote: >>> I have trouble understanding your problem analysis, and how your proposed >>> solution, would solve it. >> >> The discussion started with my proposal to modify the Zinc streams to >> check the return value of #next for nil rather than using #atEnd to >> iterate over the underlying stream. You correctly pointed out that >> the intention behind #atEnd is that it can be used to control >> iterating over streams. > > OK > >>> Where is it being said that #next and/or #atEnd should be blocking or >>> non-blocking ? >> >> There is existing code that assumes that #atEnd is non-blocking and >> that #next is allowed block. I believe that we should keep those >> conditions. > > I fail to see where that is written down, either way. Can you point me to > comments stating that, I would really like to know ? I'm not aware of it being written down, just that ever existing implementation I'm aware of behaves this way. On the other hand, making #atEnd blocking breaks Eliot's REPL sample (in Squeak). >>> How is this related to how EOF is signalled ? >> >> Because, combined with terminal EOF not being known until the user >> explicitly flags it (with Ctrl-D) it means that #atEnd can't be used >> for iterating over input from stdin connected to a terminal. > > This seems to me like an exception that only holds for one particular stream > in one particular scenario (interactive stdin). I might be wrong. > >>> It seems to me that there are quite a few classes of streams that are >>> 'special' in the sense that #next could be blocking and/or #atEnd could be >>> unclear - socket/network streams, serial streams, maybe stdio (interactive >>> or not). Without a message like #isDataAvailable you cannot handle those >>> without blocking. >> >> Right. I think this is a distraction (I was trying to explain some >> details, but it's causing more confusion instead of helping). >> >> The important point is that #atEnd doesn't work for iterating over >> streams with terminal input > > Maybe you should also point to the actual code that fails. I mean you showed > a partial stack trace, but not how you got there, precisely. How does the > application reading from an interactive stdin do to get into trouble ? Included below. >>> Reading from stdin seems like a very rare case for a Smalltalk system (not >>> that it should not be possible). >> >> There's been quite a bit of discussion and several projects recently >> related to using pharo for scripting, so it may become more common. >> E.g. >> >> https://www.quora.com/Can-Smalltalk-be-a-batch-file-scripting-language/answer/Philippe-Back-1?share=c19bfc95 >> https://github.com/rajula96reddy/pharo-cli > > Still, it is not common at all. > >>> I have a feeling that too much functionality is being pushed into too small >>> an API. >> >> This is just about how should Zinc streams be iterating over the >> underlying streams. You didn't like checking the result of #next for >> nil since it isn't general, correctly pointing out that nil is a valid >> value for non-byte oriented streams. But #atEnd doesn't work for >> stdin from a terminal. >> >> >> At this point I think there are three options: >> >> 1. Modify Zinc to check the return value of #next instead of using #atEnd. >> >> This is what all existing character / byte oriented streams in Squeak >> and Pharo do. At that point the Zinc streams can be used on all file >> / stdio input and output. > > I agree that such code exists in many places, but there is lots of stream > reading that does not check for nils. Right. Streams can be categorised in many ways, but for this discussion I think streams are broken in to two types: 1) Byte / Character oriented 2) All others For historical reasons, byte / character oriented streams need to check for EOF by using "stream next == nil" and all other streams should use #atEnd. This avoids the "nil being part of the domain" issue that was discussed earlier in the thread. >> 2. Modify all streams to signal EOF in some other way, i.e. a sentinel >> or notification / exception. >> >> This is what we were discussing below. But it is a decent chunk of >> work with significant impact on the existing code base. > > Agreed. This would be a future extension. > >> 3. Require anyone who wants to read from stdin to code around Zinc's >> inability to handle terminal input. >> >> I'd prefer to avoid this option if possible. > > See higher for a more concrete usage example request. testAtEnd.st -- | ch stream string stdin | 'stdio.cs' asFileReference fileIn. "stdin := FileStream stdin." stdin := ZnCharacterReadStream on: (ZnBufferedReadStream on: Stdio stdin). stream := (String new: 100) writeStream. ch := stdin next. [ ch == nil ] whileFalse: [ stream nextPut: ch. ch := stdin next. ]. string := stream contents. FileStream stdout nextPutAll: string; lf; nextPutAll: 'Characters read: '; nextPutAll: string size asString; lf. Smalltalk snapshot: false andQuit: true. -- Execute with: ./pharo --headless Pharo7.0-64bit-e76f1a2.image testAtEnd.st and type Ctrl-D gives: 'Errors in script loaded from testAtEnd.st' MessageNotUnderstood: receiver of "<" is nil UndefinedObject(Object)>>doesNotUnderstand: #< ZnUTF8Encoder>>nextCodePointFromStream: ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: ZnCharacterReadStream>>nextElement ZnCharacterReadStream(ZnEncodedReadStream)>>next UndefinedObject>>DoIt OpalCompiler>>evaluate Using #atEnd to control the loop instead of "stdin next == nil" produces the same result. Replacing stdin with FileStream stdin makes the script work. stdio.cs fixes a bug in StdioStream which really isn't part of this discussion (PR to be submitted). Cheers, Alistair >> Does that clarify the situation? > > Yes, it helps. Thanks. But questions remain. > >> Thanks, >> Alistair >> >> >> >>>> On 10 Apr 2018, at 18:30, Alistair Grant <akgrant0...@gmail.com> wrote: >>>> >>>> First a quick update: >>>> >>>> After doing some work on primitiveFileAtEnd, #atEnd now answers >>>> correctly for files that don't report their size correctly, e.g. >>>> /dev/urandom and /proc/cpuinfo, whether the files are opened directly or >>>> redirected through stdin. >>>> >>>> However determining whether stdin from a terminal has reached the end of >>>> file can't be done without making #atEnd blocking since we have to wait >>>> for the user to flag the end of file, e.g. by typing Ctrl-D. And #atEnd >>>> is assumed to be non-blocking. >>>> >>>> So currently using ZnCharacterReadStream with stdin from a terminal will >>>> result in a stack dump similar to: >>>> >>>> MessageNotUnderstood: receiver of "<" is nil >>>> UndefinedObject(Object)>>doesNotUnderstand: #< >>>> ZnUTF8Encoder>>nextCodePointFromStream: >>>> ZnUTF8Encoder(ZnCharacterEncoder)>>nextFromStream: >>>> ZnCharacterReadStream>>nextElement >>>> ZnCharacterReadStream(ZnEncodedReadStream)>>next >>>> UndefinedObject>>DoIt >>>> >>>> >>>> Going back through the various suggestions that have been made regarding >>>> using a sentinel object vs. raising a notification / exception, my >>>> (still to be polished) suggestion is to: >>>> >>>> 1. Add an endOfStream instance variable >>>> 2. When the end of the stream is reached answer the value of the >>>> instance variable (i.e. the result of sending #value to the variable). >>>> 3. The initial default value would be a block that raises a Deprecation >>>> warning and then returns nil. This would allow existing code to >>>> function for a changeover period. >>>> 4. At the end of the deprecation period the default value would be >>>> changed to a unique sentinel object which would answer itself as its >>>> #value. >>>> >>>> At any time users of the stream can set their own sentinel, including a >>>> block that raises an exception. >>>> >>>> >>>> Cheers, >>>> Alistair >>>> >>>> >>>> On 4 April 2018 at 19:24, Stephane Ducasse <stepharo.s...@gmail.com> wrote: >>>>> Thanks for this discussion. >>>>> >>>>> On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> >>>>> wrote: >>>>>> Alistair, >>>>>> >>>>>> First off, thanks for the discussions and your contributions, I really >>>>>> appreciate them. >>>>>> >>>>>> But I want to have a discussion at the high level of the definition and >>>>>> semantics of the stream API in Pharo. >>>>>> >>>>>>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote: >>>>>>> >>>>>>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote: >>>>>>>> Playing a bit devil's advocate, the idea is that, in general, >>>>>>>> >>>>>>>> [ stream atEnd] whileFalse: [ stream next. "..." ]. >>>>>>>> >>>>>>>> is no longer allowed ? >>>>>>> >>>>>>> It hasn't been allowed "forever" [1]. It's just been misused for >>>>>>> almost as long. >>>>>>> >>>>>>> [1] Time began when stdio stream support was introduced. :-) >>>>>> >>>>>> I am still not convinced. Another way to put it would be that the old >>>>>> #atEnd or #upToEnd do not make sense for these streams and some new loop >>>>>> is needed, based on a new test (it exists for socket streams already). >>>>>> >>>>>> [ stream isDataAvailable ] whileTrue: [ stream next ] >>>>>> >>>>>>>> And you want to replace it with >>>>>>>> >>>>>>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue. >>>>>>>> >>>>>>>> That is a pretty big change, no ? >>>>>>> >>>>>>> That's the way quite a bit of code already operates. >>>>>>> >>>>>>> As Denis pointed out, it's obviously problematic in the general sense, >>>>>>> since nil can be embedded in non-byte oriented streams. I suspect >>>>>>> that in practice not many people write code that reads streams from >>>>>>> both byte oriented and non-byte oriented streams. >>>>>> >>>>>> Maybe yes, maybe no. As Denis' example shows there is a clear definition >>>>>> problem. >>>>>> >>>>>> And I do use streams of byte arrays or strings all the time, this is >>>>>> really important. I want my parsers to work on all kinds of streams. >>>>>> >>>>>>>> I think/feel like a proper EOF exception would be better, more correct. >>>>>>>> >>>>>>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue. >>>>>>> >>>>>>> I agree, but the email thread Nicolas pointed to raises some >>>>>>> performance questions about this approach. It should be >>>>>>> straightforward to do a basic performance comparison which I'll get >>>>>>> around to if other objections aren't raised. >>>>>> >>>>>> Reading in bigger blocks, using #readInto:startingAt:count: (which is >>>>>> basically Unix's (2) Read sys call), would solve performance problems, I >>>>>> think. >>>>>> >>>>>>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ? >>>>>>> >>>>>>> Unix file i/o returns EOF if the end of file has been reach OR if an >>>>>>> error occurs. You should still check #atEnd after reading past the >>>>>>> end of the file to make sure no error occurred. Another part of the >>>>>>> primitive change I'm proposing is to return additional information >>>>>>> about what went wrong in the event of an error. >>>>>> >>>>>> I am sorry, but this kind of semantics (the OR) is way too complex at >>>>>> the general image level, it is too specific and based on certain >>>>>> underlying implementation details. >>>>>> >>>>>> Sven >>>>>> >>>>>>> We could modify the read primitive so that it fails if an error has >>>>>>> occurred, and then #atEnd wouldn't be required. >>>>>>> >>>>>>> Cheers, >>>>>>> Alistair >>>>>>> >>>>>>> >>>>>>> >>>>>>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote: >>>>>>>>> >>>>>>>>> Hi Nicolas, >>>>>>>>> >>>>>>>>> On 4 April 2018 at 12:36, Nicolas Cellier >>>>>>>>> <nicolas.cellier.aka.n...@gmail.com> wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>: >>>>>>>>>>> >>>>>>>>>>> Hi Sven, >>>>>>>>>>> >>>>>>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe >>>>>>>>>>> wrote: >>>>>>>>>>>> Somehow, somewhere there was a change to the implementation of the >>>>>>>>>>>> primitive called by some streams' #atEnd. >>>>>>>>>>> >>>>>>>>>>> That's a proposed change by me, but it hasn't been integrated yet. >>>>>>>>>>> So >>>>>>>>>>> the discussion below should apply to the current stable vm (from >>>>>>>>>>> August >>>>>>>>>>> last year). >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero' >>>>>>>>>>>> and some virtual unix files like /dev/random are zero sized. >>>>>>>>>>> >>>>>>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is >>>>>>>>>>> effectively defined as: >>>>>>>>>>> >>>>>>>>>>> atEnd := stream position >= stream size >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> And, as you say, plenty of virtual unix files report size 0. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Now, all kinds of changes are being done image size to work around >>>>>>>>>>>> this. >>>>>>>>>>> >>>>>>>>>>> I would phrase this slightly differently :-) >>>>>>>>>>> >>>>>>>>>>> Some code does the right thing, while other code doesn't. E.g.: >>>>>>>>>>> >>>>>>>>>>> MultiByteFileStream>>upToEnd is good, while >>>>>>>>>>> FileStream>>contents is incorrect >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, >>>>>>>>>>>> but I >>>>>>>>>>>> am not sure we are doing the right thing here. >>>>>>>>>>>> >>>>>>>>>>>> Point is, I am not sure #next returning nil is official and >>>>>>>>>>>> universal. >>>>>>>>>>>> >>>>>>>>>>>> Consider the comments: >>>>>>>>>>>> >>>>>>>>>>>> Stream>>#next >>>>>>>>>>>> "Answer the next object accessible by the receiver." >>>>>>>>>>>> >>>>>>>>>>>> ReadStream>>#next >>>>>>>>>>>> "Primitive. Answer the next object in the Stream represented by the >>>>>>>>>>>> receiver. Fail if the collection of this stream is not an Array or >>>>>>>>>>>> a >>>>>>>>>>>> String. >>>>>>>>>>>> Fail if the stream is positioned at its end, or if the position is >>>>>>>>>>>> out >>>>>>>>>>>> of >>>>>>>>>>>> bounds in the collection. Optional. See Object documentation >>>>>>>>>>>> whatIsAPrimitive." >>>>>>>>>>>> >>>>>>>>>>>> Note how there is no talk about returning nil ! >>>>>>>>>>>> >>>>>>>>>>>> I think we should discuss about this first. >>>>>>>>>>>> >>>>>>>>>>>> Was the low level change really correct and the right thing to do ? >>>>>>>>>>> >>>>>>>>>>> The primitive change proposed doesn't affect this discussion. It >>>>>>>>>>> will >>>>>>>>>>> mean that #atEnd returns false (correctly) sometimes, while >>>>>>>>>>> currently it >>>>>>>>>>> returns true (incorrectly). The end result is still incorrect, e.g. >>>>>>>>>>> #contents returns an empty string for /proc/cpuinfo. >>>>>>>>>>> >>>>>>>>>>> You're correct about no mention of nil, but we have: >>>>>>>>>>> >>>>>>>>>>> FileStream>>next >>>>>>>>>>> >>>>>>>>>>> (position >= readLimit and: [self atEnd]) >>>>>>>>>>> ifTrue: [^nil] >>>>>>>>>>> ifFalse: [^collection at: (position := position + 1)] >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> which has been around for a long time (I suspect, before Pharo >>>>>>>>>>> existed). >>>>>>>>>>> >>>>>>>>>>> Having said that, I think that raising an exception is a better >>>>>>>>>>> solution, but it is a much, much bigger change than the one I >>>>>>>>>>> proposed >>>>>>>>>>> in https://github.com/pharo-project/pharo/pull/1180. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Cheers, >>>>>>>>>>> Alistair >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Hi, >>>>>>>>>> yes, if you are after universal behavior englobing Unix streams, the >>>>>>>>>> Exception might be the best way. >>>>>>>>>> Because on special stream you can't allways say in advance, you have >>>>>>>>>> to try. >>>>>>>>>> That's the solution adopted by authors of Xtreams. >>>>>>>>>> But there is a runtime penalty associated to it. >>>>>>>>>> >>>>>>>>>> The penalty once was so high that my proposal to generalize >>>>>>>>>> EndOfStream >>>>>>>>>> usage was rejected a few years ago by AndreaRaab. >>>>>>>>>> http://forum.world.st/EndOfStream-unused-td68806.html >>>>>>>>> >>>>>>>>> Thanks for this, I'll definitely take a look. >>>>>>>>> >>>>>>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream >>>>>>>>> object would compare? >>>>>>>>> >>>>>>>>> It would keep the same coding style, but avoid the problems with nil. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Alistair >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> I have regularly benched Xtreams, but stopped a few years ago. >>>>>>>>>> Maybe i can excavate and pass on newer VM. >>>>>>>>>> >>>>>>>>>> In the mean time, i had experimented a programmable end of stream >>>>>>>>>> behavior >>>>>>>>>> (via a block, or any other valuable) >>>>>>>>>> http://www.squeaksource.com/XTream.htm >>>>>>>>>> so as to reconcile performance and universality, but it was a source >>>>>>>>>> of >>>>>>>>>> complexification at implementation side. >>>>>>>>>> >>>>>>>>>> Nicolas >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Note also that a Guille introduced something new, #closed which is >>>>>>>>>>>> related to the difference between having no more elements (maybe >>>>>>>>>>>> right now, >>>>>>>>>>>> like an open network stream) and never ever being able to produce >>>>>>>>>>>> more data. >>>>>>>>>>>> >>>>>>>>>>>> Sven >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >>> >> <stdio.cs> > >
'From Pharo7.0alpha of 5 April 2018 [Build information: Pharo-7.0+alpha.build.756.sha.de40aebd759f4b81a41094992741c67e031f6b3f (64 Bit)] on 9 April 2018 at 8:59:31.566446 am'! !AbstractBinaryFileStream methodsFor: 'private' stamp: 'AlistairGrant 4/9/2018 08:53'! primAtEnd: id "Answer true if the file position is at the end of the file." <primitive: 'primitiveFileAtEnd' module: 'FilePlugin'> self primitiveFailed ! ! !AbstractBinaryFileStream methodsFor: 'testing' stamp: 'AlistairGrant 4/9/2018 08:53'! atEnd ^ self primAtEnd: handle! ! StdioStream removeSelector: #atEnd! !StdioStream reorganize! (#positioning position) (#accessing peek next:) ! BinaryFileStream removeSelector: #atEnd! BinaryFileStream removeSelector: #primAtEnd:! !BinaryFileStream reorganize! (#positioning setToEnd position truncate: skip: truncate position: reset) (#testing closed) (#flushing sync flush) (#finalization finalize register unregister) (#'open/close' close) (#private primGetPosition: primSize: primSizeNoError: primCloseNoError: primSetPosition:to:) (#accessing peek size upToEnd) !