Thanks for this discussion.

On Wed, Apr 4, 2018 at 1:37 PM, Sven Van Caekenberghe <s...@stfx.eu> wrote:
> Alistair,
>
> First off, thanks for the discussions and your contributions, I really 
> appreciate them.
>
> But I want to have a discussion at the high level of the definition and 
> semantics of the stream API in Pharo.
>
>> On 4 Apr 2018, at 13:20, Alistair Grant <akgrant0...@gmail.com> wrote:
>>
>> On 4 April 2018 at 12:56, Sven Van Caekenberghe <s...@stfx.eu> wrote:
>>> Playing a bit devil's advocate, the idea is that, in general,
>>>
>>> [ stream atEnd] whileFalse: [ stream next. "..." ].
>>>
>>> is no longer allowed ?
>>
>> It hasn't been allowed "forever" [1].  It's just been misused for
>> almost as long.
>>
>> [1] Time began when stdio stream support was introduced. :-)
>
> I am still not convinced. Another way to put it would be that the old #atEnd 
> or #upToEnd do not make sense for these streams and some new loop is needed, 
> based on a new test (it exists for socket streams already).
>
> [ stream isDataAvailable ] whileTrue: [ stream next ]
>
>>> And you want to replace it with
>>>
>>> [ stream next ifNil: [ false ] ifNotNil: [ :x | "..." true ] whileTrue.
>>>
>>> That is a pretty big change, no ?
>>
>> That's the way quite a bit of code already operates.
>>
>> As Denis pointed out, it's obviously problematic in the general sense,
>> since nil can be embedded in non-byte oriented streams.  I suspect
>> that in practice not many people write code that reads streams from
>> both byte oriented and non-byte oriented streams.
>
> Maybe yes, maybe no. As Denis' example shows there is a clear definition 
> problem.
>
> And I do use streams of byte arrays or strings all the time, this is really 
> important. I want my parsers to work on all kinds of streams.
>
>>> I think/feel like a proper EOF exception would be better, more correct.
>>>
>>> [ [ stream next. "..." true ] on: EOF do: [ false ] ] whileTrue.
>>
>> I agree, but the email thread Nicolas pointed to raises some
>> performance questions about this approach.  It should be
>> straightforward to do a basic performance comparison which I'll get
>> around to if other objections aren't raised.
>
> Reading in bigger blocks, using #readInto:startingAt:count: (which is 
> basically Unix's (2) Read sys call), would solve performance problems, I 
> think.
>
>>> Will we throw away #atEnd then ? Do we need it if we cannot use it ?
>>
>> Unix file i/o returns EOF if the end of file has been reach OR if an
>> error occurs.  You should still check #atEnd after reading past the
>> end of the file to make sure no error occurred.  Another part of the
>> primitive change I'm proposing is to return additional information
>> about what went wrong in the event of an error.
>
> I am sorry, but this kind of semantics (the OR) is way too complex at the 
> general image level, it is too specific and based on certain underlying 
> implementation details.
>
> Sven
>
>> We could modify the read primitive so that it fails if an error has
>> occurred, and then #atEnd wouldn't be required.
>>
>> Cheers,
>> Alistair
>>
>>
>>
>>>> On 4 Apr 2018, at 12:41, Alistair Grant <akgrant0...@gmail.com> wrote:
>>>>
>>>> Hi Nicolas,
>>>>
>>>> On 4 April 2018 at 12:36, Nicolas Cellier
>>>> <nicolas.cellier.aka.n...@gmail.com> wrote:
>>>>>
>>>>>
>>>>> 2018-04-04 12:18 GMT+02:00 Alistair Grant <akgrant0...@gmail.com>:
>>>>>>
>>>>>> Hi Sven,
>>>>>>
>>>>>> On Wed, Apr 04, 2018 at 11:32:02AM +0200, Sven Van Caekenberghe wrote:
>>>>>>> Somehow, somewhere there was a change to the implementation of the
>>>>>>> primitive called by some streams' #atEnd.
>>>>>>
>>>>>> That's a proposed change by me, but it hasn't been integrated yet.  So
>>>>>> the discussion below should apply to the current stable vm (from August
>>>>>> last year).
>>>>>>
>>>>>>
>>>>>>> IIRC, someone said it is implemented as 'remaining size being zero'
>>>>>>> and some virtual unix files like /dev/random are zero sized.
>>>>>>
>>>>>> Currently, for files other than sdio (stdout, stderr, stdin) it is
>>>>>> effectively defined as:
>>>>>>
>>>>>> atEnd := stream position >= stream size
>>>>>>
>>>>>>
>>>>>> And, as you say, plenty of virtual unix files report size 0.
>>>>>>
>>>>>>
>>>>>>
>>>>>>> Now, all kinds of changes are being done image size to work around this.
>>>>>>
>>>>>> I would phrase this slightly differently :-)
>>>>>>
>>>>>> Some code does the right thing, while other code doesn't.  E.g.:
>>>>>>
>>>>>> MultiByteFileStream>>upToEnd is good, while
>>>>>> FileStream>>contents is incorrect
>>>>>>
>>>>>>
>>>>>>> I am a strong believer in simple, real (i.e. infinite) streams, but I
>>>>>>> am not sure we are doing the right thing here.
>>>>>>>
>>>>>>> Point is, I am not sure #next returning nil is official and universal.
>>>>>>>
>>>>>>> Consider the comments:
>>>>>>>
>>>>>>> Stream>>#next
>>>>>>> "Answer the next object accessible by the receiver."
>>>>>>>
>>>>>>> ReadStream>>#next
>>>>>>> "Primitive. Answer the next object in the Stream represented by the
>>>>>>> receiver. Fail if the collection of this stream is not an Array or a
>>>>>>> String.
>>>>>>> Fail if the stream is positioned at its end, or if the position is out
>>>>>>> of
>>>>>>> bounds in the collection. Optional. See Object documentation
>>>>>>> whatIsAPrimitive."
>>>>>>>
>>>>>>> Note how there is no talk about returning nil !
>>>>>>>
>>>>>>> I think we should discuss about this first.
>>>>>>>
>>>>>>> Was the low level change really correct and the right thing to do ?
>>>>>>
>>>>>> The primitive change proposed doesn't affect this discussion.  It will
>>>>>> mean that #atEnd returns false (correctly) sometimes, while currently it
>>>>>> returns true (incorrectly).  The end result is still incorrect, e.g.
>>>>>> #contents returns an empty string for /proc/cpuinfo.
>>>>>>
>>>>>> You're correct about no mention of nil, but we have:
>>>>>>
>>>>>> FileStream>>next
>>>>>>
>>>>>>       (position >= readLimit and: [self atEnd])
>>>>>>               ifTrue: [^nil]
>>>>>>               ifFalse: [^collection at: (position := position + 1)]
>>>>>>
>>>>>>
>>>>>> which has been around for a long time (I suspect, before Pharo existed).
>>>>>>
>>>>>> Having said that, I think that raising an exception is a better
>>>>>> solution, but it is a much, much bigger change than the one I proposed
>>>>>> in https://github.com/pharo-project/pharo/pull/1180.
>>>>>>
>>>>>>
>>>>>> Cheers,
>>>>>> Alistair
>>>>>>
>>>>>
>>>>> Hi,
>>>>> yes, if you are after universal behavior englobing Unix streams, the
>>>>> Exception might be the best way.
>>>>> Because on special stream you can't allways say in advance, you have to 
>>>>> try.
>>>>> That's the solution adopted by authors of Xtreams.
>>>>> But there is a runtime penalty associated to it.
>>>>>
>>>>> The penalty once was so high that my proposal to generalize EndOfStream
>>>>> usage was rejected a few years ago by AndreaRaab.
>>>>> http://forum.world.st/EndOfStream-unused-td68806.html
>>>>
>>>> Thanks for this, I'll definitely take a look.
>>>>
>>>> Do you have a sense of how Denis' suggestion of using an EndOfStream
>>>> object would compare?
>>>>
>>>> It would keep the same coding style, but avoid the problems with nil.
>>>>
>>>> Thanks,
>>>> Alistair
>>>>
>>>>
>>>>
>>>>> I have regularly benched Xtreams, but stopped a few years ago.
>>>>> Maybe i can excavate and pass on newer VM.
>>>>>
>>>>> In the mean time, i had experimented a programmable end of stream behavior
>>>>> (via a block, or any other valuable)
>>>>> http://www.squeaksource.com/XTream.htm
>>>>> so as to reconcile performance and universality, but it was a source of
>>>>> complexification at implementation side.
>>>>>
>>>>> Nicolas
>>>>>
>>>>>>
>>>>>>
>>>>>>> Note also that a Guille introduced something new, #closed which is
>>>>>>> related to the difference between having no more elements (maybe right 
>>>>>>> now,
>>>>>>> like an open network stream) and never ever being able to produce more 
>>>>>>> data.
>>>>>>>
>>>>>>> Sven
>>>
>>>
>>
>
>

Reply via email to