Re: Unsafe hGetContents

2009-10-10 Thread Iavor Diatchki
Hello,

well, I think that the fact that we seem to have a program context
that can distinguish "f1" from "f2" is worth discussing because I
would have thought that in a pure language they are interchangable.
The question is, does the context in Oleg's example really distinguish
between "f1" and "f2"?  You seem to be saying that this is not the
case:  in both cases you end up with the same non-deterministic
program that reads two numbers from the standard input and subtracts
them but you can't assume anything about the order in which the
numbers are extracted from the input---it is merely an artifact of the
GHC implementation that with "f1" the subtraction always happens the
one way, and with "f2" it happens the other way.

I can (sort of) buy this argument, after all, it is quite similar to
what happens with asynchronous exceptions (f1 (error "1") (error "2")
vs f2 (error "1") (error "2")).  Still, the whole thing does not
"smell right":  there is some impurity going on here, and trying to
offload the problem onto the IO monad only makes reasoning about IO
computations even harder (and it is petty hard to start with).  So,
discussion and alternative solutions should be strongly encouraged, I
think.

-Iavor







On Sat, Oct 10, 2009 at 7:38 AM, Duncan Coutts
 wrote:
> On Sat, 2009-10-10 at 02:51 -0700, o...@okmij.org wrote:
>
>> > The reason it's hard is that to demonstrate a difference you have to get
>> > the lazy I/O to commute with some other I/O, and GHC will never do that.
>>
>> The keyword here is GHC. I may well believe that GHC is able to divine
>> programmer's true intent and so it always does the right thing. But
>> writing in the language standard ``do what the version x.y.z of GHC
>> does'' does not seem very appropriate, or helpful to other
>> implementors.
>
> With access to unsafeInterleaveIO it's fairly straightforward to show
> that it is non-deterministic. These programs that bypass the safety
> mechanisms on hGetContents just get us back to having access to the
> non-deterministic semantics of unsafeInterleaveIO.
>
>> > Haskell's IO library is carefully designed to not run into this
>> > problem on its own.  It's normally not possible to get two Handles
>> > with the same FD...
>
>> Is this behavior is specified somewhere, or is this just an artifact
>> of a particular GHC implementation?
>
> It is in the Haskell 98 report, in the design of the IO library. It does
> not not mention FDs of course. The IO/Handle functions it provides give
> no (portable) way to obtain two read handles on the same OS file
> descriptor. The hGetContents behaviour of semi-closing is to stop you
> from getting two lazy lists of the same read Handle.
>
> There's nothing semantically wrong with you bypassing those restrictions
> (eg openFile "/dev/fd/0") it just means you end up with a
> non-deterministic IO program, which is something we typically try to
> avoid.
>
> I am a bit perplexed by this whole discussion. It seems to come down to
> saying that unsafeInterleaveIO is non-deterministic and that things
> implemented on top are also non-deterministic. The standard IO library
> puts up some barriers to restrict the non-determinism, but if you walk
> around the barrier then you can still find it. It's not clear to me what
> is supposed to be surprising or alarming here.
>
> Duncan
>
> ___
> Haskell-prime mailing list
> Haskell-prime@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-prime
>
___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Re: Unsafe hGetContents

2009-10-10 Thread Duncan Coutts
On Sat, 2009-10-10 at 02:51 -0700, o...@okmij.org wrote:

> > The reason it's hard is that to demonstrate a difference you have to get
> > the lazy I/O to commute with some other I/O, and GHC will never do that.
> 
> The keyword here is GHC. I may well believe that GHC is able to divine
> programmer's true intent and so it always does the right thing. But
> writing in the language standard ``do what the version x.y.z of GHC
> does'' does not seem very appropriate, or helpful to other
> implementors.

With access to unsafeInterleaveIO it's fairly straightforward to show
that it is non-deterministic. These programs that bypass the safety
mechanisms on hGetContents just get us back to having access to the
non-deterministic semantics of unsafeInterleaveIO.

> > Haskell's IO library is carefully designed to not run into this
> > problem on its own.  It's normally not possible to get two Handles
> > with the same FD...

> Is this behavior is specified somewhere, or is this just an artifact
> of a particular GHC implementation?

It is in the Haskell 98 report, in the design of the IO library. It does
not not mention FDs of course. The IO/Handle functions it provides give
no (portable) way to obtain two read handles on the same OS file
descriptor. The hGetContents behaviour of semi-closing is to stop you
from getting two lazy lists of the same read Handle.

There's nothing semantically wrong with you bypassing those restrictions
(eg openFile "/dev/fd/0") it just means you end up with a
non-deterministic IO program, which is something we typically try to
avoid.

I am a bit perplexed by this whole discussion. It seems to come down to
saying that unsafeInterleaveIO is non-deterministic and that things
implemented on top are also non-deterministic. The standard IO library
puts up some barriers to restrict the non-determinism, but if you walk
around the barrier then you can still find it. It's not clear to me what
is supposed to be surprising or alarming here.

Duncan

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime


Unsafe hGetContents

2009-10-10 Thread oleg

Simon Marlow wrote:
> Ah yes, if you have two lazy input streams both referring to the same
> underlying stream, that is enough to demonstrate a problem.  As for
> whether Oleg's example is within the rules, it depends whether you
> consider fdToHandle as "unsafe"

I wasn't aware of the rules. Fortunately, UNIX (FreeBSD and Linux)
give plenty of opportunities to shoot oneself. Here is the code from
the earlier message without the offending fdToHandle:

> {- Haskell98! -}
>
> module Main where
>
> import System.IO
>
> -- f1 and f2 are both pure functions, with the pure type.
> -- Both compute the result of the subtraction e1 - e2.
> -- The only difference between them is the sequence of
> -- evaluating their arguments, e1 `seq` e2 vs. e2 `seq` e1
> -- For really pure functions, that difference should not be observable
>
> f1, f2:: Int ->Int ->Int
>
> f1 e1 e2 = e1 `seq` e2 `seq` e1 - e2
> f2 e1 e2 = e2 `seq` e1 `seq` e1 - e2
>
> read_int s = read . head . words $ s
>
> main = do
>let h1 = stdin
>h2 <- openFile "/dev/stdin" ReadMode
>s1 <- hGetContents h1
>s2 <- hGetContents h2
>-- print $ f1 (read_int s1) (read_int s2)
>print $ f2 (read_int s1) (read_int s2)

It exhibits the same behavior that was described in
http://www.haskell.org/pipermail/haskell/2009-March/021064.html

I think Windows may have something similar.


> The reason it's hard is that to demonstrate a difference you have to get
> the lazy I/O to commute with some other I/O, and GHC will never do that.

The keyword here is GHC. I may well believe that GHC is able to divine
programmer's true intent and so it always does the right thing. But
writing in the language standard ``do what the version x.y.z of GHC
does'' does not seem very appropriate, or helpful to other
implementors.

> Haskell's IO library is carefully designed to not run into this
> problem on its own.  It's normally not possible to get two Handles
> with the same FD...
Is this behavior is specified somewhere, or is this just an artifact
of a particular GHC implementation?

___
Haskell-prime mailing list
Haskell-prime@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-prime