Re: Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-24 Thread Duncan Coutts
On Sat, 2009-05-23 at 20:42 -0400, Mario Blažević wrote:
 On Sat 23/05/09  2:51 PM , Duncan Coutts duncan.cou...@worc.ox.ac.uk sent:
  On Sat, 2009-05-23 at 13:31 -0400, Mario Blažević wrote:
  ...
  So the function is not strict, and I don't understand
  why GHC should evaluate the arguments before the call.
  
  Right, it's lazy in the first and strict in the second argument. As far
  as I can see we have no evidence that is is evaluating anything before
  the call.
 
 
 When I look at the Core definition of `test', it begins with
 
 
 \ (n1axl::integer:GHCziIntegerziInternals.Integer)
   (n2axn::integer:GHCziIntegerziInternals.Integer) -
 %let as1sU :: integer:GHCziIntegerziInternals.Integer =
base:DataziList.prod1
(main:Main.factors2
 (base:DataziList.prod1
  (base:GHCziNum.upzulist main:Main.lvl main:Main.lvl n1axl)
  base:DataziList.lvl1))
base:DataziList.lvl1
 %in %case integer:GHCziIntegerziInternals.Integer 
 (ghczmprim:GHCziPrim.parzh
@
 integer:GHCziIntegerziInternals.Integer
as1sU)
 %of (dsapq::ghczmprim:GHCziPrim.Intzh)

I recommend using -ddump-simpl, as it produces more readable output.

 To my untrained eyes, this looks like it's evaluating

  product $ factors $ product [1..n1])
 
 which is the first argument to `parallelize'.

That core code is doing:

 let blah = product $ factors $ product thelist
  in case par# blah of
   _ - ...

So it's calling the primitive par# but nothing is forced to WHNF (except
the unused dummy return value of par#).

 I assume that %case in Core evaluates the argument to WHNF, just like
 case in Haskell.

Yes.

(In fact case in core always reduces to WHNF, where as in Haskell case
(...) of bar - (...)  does not force anything.)

 Then again, I could be completely misinterpreting what Core is, because
 I can't find any call to `parallelize' before or after that. It appears
 to be inlined in Core, regardless of whether the pragma
 
  {-# INLINE parallelize #-}
 
 is there or not.

Yes, because even without the pragma, ghc decides it's profitable to
inline.

 Actually, I can't see any effect of that pragma in the
 core files whatsoever, but it certainly has effect on run time.

How about diffing the whole core output (and using -ddump-simpl). If
there's a performance difference then there must be a difference in the
core code too.

 Or do you mean to say that *your* installation of GHC behaves
 the same when the function `parallelize' is defined in the same module and 
 when
 it's imported?

Yes, exactly. With both ghc-6.10.1 and a very recent build of ghc-6.11

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-24 Thread Mario Blažević

 I recommend using -ddump-simpl, as it produces more readable output.
 
 Actually, I can't see any effect of that pragma in the
 core files whatsoever, but it certainly has effect on
 run time.
 
 How about diffing the whole core output (and using -ddump-simpl). If
 there's a performance difference then there must be a difference in the
 core code too.

Ok.

$ ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl  
main.simpl
$ time ./primes-test +RTS -N2
4001

real0m9.636s
user0m18.201s
sys 0m0.088s

$ ghc-6.11.20090421 --make primes-test.hs -threaded -O2 -ddump-simpl 
imported.simpl
$ time ./primes-test +RTS -N24001

real0m17.547s
user0m17.453s
sys 0m0.052s


I can't exactly use diff because the generated identifier names are not the 
same,
but after poring over with Emacs ediff I have found only one difference that's
not attributable to identifiers:

$diff main.simpl imported.simpl
...
223c232
   a_s1rs [ALWAYS Just L] :: GHC.Integer.Internals.Integer
---
   a_s1sV [ALWAYS Just S] :: GHC.Integer.Internals.Integer
...


Does this S vs. L difference have anything to do with strictness and laziness?
That line is a part of the `Main.test' definition:

$ diff -C 3 main.simpl imported.simpl
*** 217,244 
  [Arity 2
   Str: DmdType LL]
  Main.test =
!   \ (n1_ahR :: GHC.Integer.Internals.Integer)
! (n2_ahT :: GHC.Integer.Internals.Integer) -
  let {
!   a_s1rs [ALWAYS Just L] :: GHC.Integer.Internals.Integer
LclId
[Str: DmdType]
!   a_s1rs =
  Data.List.prod1
(Main.factors2
   (Data.List.prod1
! (GHC.Num.up_list Main.lvl Main.lvl n1_ahR) Data.List.lvl1))
Data.List.lvl1 } in
! case GHC.Prim.par# @ GHC.Integer.Internals.Integer a_s1rs
  of _ { __DEFAULT -
  case Data.List.prod1
 (Main.factors2
(Data.List.prod1
!  (GHC.Num.up_list Main.lvl Main.lvl n2_ahT) Data.List.lvl1))
 Data.List.lvl1
! of x1_aUS { __DEFAULT -
! case GHC.Real.$wdivMod x1_aUS a_s1rs of _ { (# ww1_aUo, _ #) -
! ww1_aUo
  }
  }
  }


 Or do you mean to say that *your* installation of GHC
 behaves the same when the function `parallelize' is defined in
 the same module and when it's imported?
 
 Yes, exactly. With both ghc-6.10.1 and a very recent build of ghc-6.11

As you can see, I'm using the 6.11.20090421 build. I looked at recent ones, but
the Linux builds seem to be less stable in May. I have the same results (though 
I
didn't use -ddump-simpl) with 6.8.2. Can you recommend a different version to 
try?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-24 Thread Duncan Coutts
On Sun, 2009-05-24 at 12:48 -0400, Mario Blažević wrote:

  How about diffing the whole core output (and using -ddump-simpl). If
  there's a performance difference then there must be a difference in the
  core code too.

 I can't exactly use diff because the generated identifier names are not the 
 same,
 but after poring over with Emacs ediff I have found only one difference that's
 not attributable to identifiers:
 
 $diff main.simpl imported.simpl
 ...
 223c232
a_s1rs [ALWAYS Just L] :: GHC.Integer.Internals.Integer
 ---
a_s1sV [ALWAYS Just S] :: GHC.Integer.Internals.Integer
 ...

Good find!

 Does this S vs. L difference have anything to do with strictness and laziness?

Yes.

So, I think we should open a ghc but report with all the details,
particularly the example's source, the ghc version and that highlight of
that strictness difference.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Mario Blažević

On Fri 22/05/09 10:51 AM , John Lato jwl...@gmail.com sent:
 Hi Mario,
 
 It looks like the parallelize function is getting inlined when it's in
 the same file, but not when it's in a separate file.
 
 Adding a {-# INLINE parallelize #-} pragma to the module with
 parallelize recovers all the performance for me.
 
 You could probably see exactly what's happening in more detail by
 going through the Core output.


Thank you, this advice helped. The Core output indicates that function `test'
evaluates the arguments to `parallelize' before it calls it. In other words, the
call to `parallelize' is optimized as a strict function call -- which it is. The
problem is that this optimization evaluates the arguments sequentially. 
Compiling
with optimizations turned off regains the parallel execution.

I guess I will report this as a GHC bug. Or is it a feature request?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Don Stewart
mblazevic:
 
 On Fri 22/05/09 10:51 AM , John Lato jwl...@gmail.com sent:
  Hi Mario,
  
  It looks like the parallelize function is getting inlined when it's in
  the same file, but not when it's in a separate file.
  
  Adding a {-# INLINE parallelize #-} pragma to the module with
  parallelize recovers all the performance for me.
  
  You could probably see exactly what's happening in more detail by
  going through the Core output.
 
 
 Thank you, this advice helped. The Core output indicates that function `test'
 evaluates the arguments to `parallelize' before it calls it. In other words, 
 the
 call to `parallelize' is optimized as a strict function call -- which it is. 
 The
 problem is that this optimization evaluates the arguments sequentially. 
 Compiling
 with optimizations turned off regains the parallel execution.
 
 I guess I will report this as a GHC bug. Or is it a feature request?

As Duncan suggessted, try with GHC head (grab a snapshot). `par` et al
are much improved.

-- Don
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Mario Blažević
 You could probably see exactly what's happening in
 more detail by going through the Core output.
 
 Thank you, this advice helped. The Core output indicates
 that function `test' evaluates the arguments to
 `parallelize' before it calls it. In other words, the
 call to `parallelize' is optimized as a strict function
 call -- which it is. The problem is that this
 optimization evaluates the arguments sequentially.
 Compiling with optimizations turned off regains the
 parallel execution.

 I guess I will report this as a GHC bug. Or is it a
 feature request?
 
 As Duncan suggessted, try with GHC head (grab a snapshot). `par` et al
 are much improved.

I already have, with the snapshot from 21st of April. It behaves the same
as 6.8.2, except it runs for twice as long.

I'd like to take back a part of what I said before, though: `parallelize' should
be strict only in its second argument. Its strictness in the first argument
should be the same as with `par`. Even though `parallelize x y'
always evaluates both x and y, the following test works fine with optimizations
even if `parallelize' is imported:

main = putStrLn (snd $ parallelize undefined Hello, World!)

So the function is not strict, and I don't understand why GHC should evaluate 
the
arguments before the call.

Does anybody know of a pragma or another way to make a function *non-strict* 
even
if it does always evaluate its argument? In other words, is there a way to
selectively disable the strictness optimization?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Max Rabkin
On Sat, May 23, 2009 at 7:31 PM, Mario Blažević mblaze...@stilo.com wrote:
 Does anybody know of a pragma or another way to make a function *non-strict* 
 even
 if it does always evaluate its argument? In other words, is there a way to
 selectively disable the strictness optimization?

parallelize a b | False = (undefined, undefined)
   | otherwise = a `par` (b `pseq` (a, b))

might do, unless strictness analysis is smart enough to know that the
False guard is always, well, False.

--Max
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Alexander Dunlap
On Sat, May 23, 2009 at 11:34 AM, Max Rabkin max.rab...@gmail.com wrote:
 On Sat, May 23, 2009 at 7:31 PM, Mario Blažević mblaze...@stilo.com wrote:
 Does anybody know of a pragma or another way to make a function *non-strict* 
 even
 if it does always evaluate its argument? In other words, is there a way to
 selectively disable the strictness optimization?

 parallelize a b | False = (undefined, undefined)
                   | otherwise = a `par` (b `pseq` (a, b))

 might do, unless strictness analysis is smart enough to know that the
 False guard is always, well, False.

 --Max
 ___

GHC.Prim.lazy?

Alex
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Duncan Coutts
On Sat, 2009-05-23 at 13:31 -0400, Mario Blažević wrote:
  You could probably see exactly what's happening in
  more detail by going through the Core output.
  
  Thank you, this advice helped. The Core output indicates
  that function `test' evaluates the arguments to
  `parallelize' before it calls it. In other words, the
  call to `parallelize' is optimized as a strict function
  call -- which it is. The problem is that this
  optimization evaluates the arguments sequentially.
  Compiling with optimizations turned off regains the
  parallel execution.
 
  I guess I will report this as a GHC bug. Or is it a
  feature request?
  
  As Duncan suggessted, try with GHC head (grab a snapshot). `par` et al
  are much improved.
 
 I already have, with the snapshot from 21st of April. It behaves the same
 as 6.8.2, except it runs for twice as long.
 
 I'd like to take back a part of what I said before, though: `parallelize' 
 should
 be strict only in its second argument.

parallelize a b = a `par` (b `pseq` (a, b))

GHC infers that strictness of parallelize. It thinks that it is lazy in
both args (you can check this yourself using ghc --show-iface). That's
because the definitions of par and pseq use the function 'lazy':

-- The reason for the strange lazy call is that it fools
-- the compiler into thinking that pseq  and par are non-strict in
-- their second argument (even if it inlines pseq at the call site).
-- If it thinks pseq is strict in y, then it often evaluates
-- y before x, which is totally wrong.

pseq  x y = x `seq` lazy y
par  x y = case (par# x) of { _ - lazy y }

So GHC thinks that par is lazy in both arguments, while it thinks pseq
is strict in the first and lazy in the second.

 Its strictness in the first argument should be the same as with `par`.

Yes, which it is.

 Even though `parallelize x y' always evaluates both x and y,

Be careful about what you mean. Yes, it starts the evaluation of x in
parallel with y, but that does not mean it is strict in x, as you notice
with your example of undefined below.

 the following test works fine with optimizations even if `parallelize'
 is imported:
 
 main = putStrLn (snd $ parallelize undefined Hello, World!)
 
 So the function is not strict, and I don't understand why GHC should evaluate 
 the
 arguments before the call.

Right, it's lazy in the first and strict in the second argument. As far
as I can see we have no evidence that is is evaluating anything before
the call.

 Does anybody know of a pragma or another way to make a function *non-strict* 
 even
 if it does always evaluate its argument? In other words, is there a way to
 selectively disable the strictness optimization?

Yes, which is what pseq and par already do.

If there's a bug, we need to reproduce it and report it. I cannot
reproduce it.

Duncan

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Mario Blažević
On Sat 23/05/09  4:15 PM , Alexander Dunlap alexander.dun...@gmail.com sent:
 Does anybody know of a pragma or another way to make a
 function *non-strict* even if it does always evaluate its
 argument? In other words, is there a way to
 selectively disable the strictness optimization?

 parallelize a b | False = (undefined, undefined)
                 | otherwise = a `par` (b `pseq` (a, b))

 might do, unless strictness analysis is smart enough to
 know that the False guard is always, well, False.
 ___
 
 GHC.Prim.lazy?

It's GHC.Exts.lazy nowadays, and it doesn't have any effect. Neither
does the `| False' guard. The only way I found to disable the eager
argument evaluation is to pass them in as functions:

 parallelize :: Num t = (t - a) - (t - b) - (a, b)
 parallelize a b = let a' = a 1
   b' = b 1
   in (a' `par` (b' `pseq` (a', b')))

Then it can be imported and called like this:

 test n1 n2 = let (p1, p2) = parallelize
(\n0- product $ factors $ product [n0..n1])
(\n0- product $ factors $ product [n0..n2])

This solution is incredibly fragile. If the declared type of parallelize 
is modified by replacing t by Integer, the evaluation becomes eager again.
Also, the argument functions can't simply take () for argument which
would make this solution reasonable.

If this is all the strictness optimizer's fault, I'm awed by how
difficult it is to trick it into not doing its job.


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: Re: Re: [Haskell-cafe] Re: A problem with par and modules boundaries...

2009-05-23 Thread Mario Blažević
On Sat 23/05/09  2:51 PM , Duncan Coutts duncan.cou...@worc.ox.ac.uk sent:
 On Sat, 2009-05-23 at 13:31 -0400, Mario Blažević wrote:
 ...
 So the function is not strict, and I don't understand
 why GHC should evaluate the arguments before the call.
 
 Right, it's lazy in the first and strict in the second argument. As far
 as I can see we have no evidence that is is evaluating anything before
 the call.


When I look at the Core definition of `test', it begins with


\ (n1axl::integer:GHCziIntegerziInternals.Integer)
  (n2axn::integer:GHCziIntegerziInternals.Integer) -
%let as1sU :: integer:GHCziIntegerziInternals.Integer =
   base:DataziList.prod1
   (main:Main.factors2
(base:DataziList.prod1
 (base:GHCziNum.upzulist main:Main.lvl main:Main.lvl n1axl)
 base:DataziList.lvl1))
   base:DataziList.lvl1
%in %case integer:GHCziIntegerziInternals.Integer 
(ghczmprim:GHCziPrim.parzh
   @
integer:GHCziIntegerziInternals.Integer
   as1sU)
%of (dsapq::ghczmprim:GHCziPrim.Intzh)


To my untrained eyes, this looks like it's evaluating

 product $ factors $ product [1..n1])

which is the first argument to `parallelize'. I assume that %case in
Core evaluates the argument to WHNF, just like case in Haskell.

Then again, I could be completely misinterpreting what Core is, because
I can't find any call to `parallelize' before or after that. It appears
to be inlined in Core, regardless of whether the pragma

 {-# INLINE parallelize #-}

is there or not. Actually, I can't see any effect of that pragma in the
core files whatsoever, but it certainly has effect on run time.

 Does anybody know of a pragma or another way to make a
 function *non-strict* even if it does always evaluate its argument?
 In other words, is there a way to selectively disable the strictness
 optimization?
 
 Yes, which is what pseq and par already do.
 
 If there's a bug, we need to reproduce it and report it. I cannot
 reproduce it.

If you mean that you can't reproduce anything that's contrary to the
specification, that's not saying much: there are practically no guarantees on
what `par' is supposed to accomplish. If you mean you can't reproduce anything
you wouldn't expect, pray explain what is going on, because everybody else seems
to be surprised. Or do you mean to say that *your* installation of GHC behaves
the same when the function `parallelize' is defined in the same module and when
it's imported?


___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe