RE: optimization and rewrite rules questions

2009-02-26 Thread Simon Peyton-Jones
| II is where I'd like to be able to distinguish variables, constants,
| and complex expressions in the left-hand sides of RULES, and
| I and III are where I'd like control over the rewrite strategy, as
| in strategy combinators.

I'm deep in icfp submissions, so no time to reply properly.

You can distinguish between literals, variables etc, in GHC's BuiltinRules.  
These are not hard to write: see prelude/PrelRules.  But they are built in, not 
part of the source program.  Maybe that's ok, since you are messing with 
built-in arithmetic.


Another avenue is to elaborate the language of rules somehow to let you say 
what you want. But I don't know a good *spec* for such a feature, let alone an 
impl.

Another possibility is to write a Core-to-Core optimiser aimed at your target 
area.  Max B's plugins would let you dynamically link that to a distributed 
GHC.  But it's not in the HEAD yet...

Simon
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: optimization and rewrite rules questions

2009-02-26 Thread Max Bolingbroke
2009/2/24 Claus Reinke claus.rei...@talk21.com:
 In the recently burried haskell-cafe thread speed: ghc vs gcc,
 Bulat pointed out some of the optimizations that GHC doesn't
 do, such as loop unrolling. I suggested a way of experimenting with loop
 unrolling, using template haskell to bypass GHC's blindspot (it usually
 doesn't unfold recursive definitions
 http://www.haskell.org/pipermail/glasgow-haskell-users/2007-July/012936.html
 ,
 but if we unfold a loop combinator at compile time, GHC's
 normal optimizations can take over from there):

 http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html

Just a note - there is a solution that doesn't require Template
Haskell which I use in my own code. Here is a sketch:

fact = fix4 fact_worker

{-# INLINE fact_worker #-}
fact_worker recurse n
  | n = 0 = 1
  | otherwise = n * recurse (n - 1)

{-# INLINE fix4 #-}
fix4 f = f1
  where
f1 = f f2
f2 = f f3
f3 = f f4
f4 = f f1

There is probably a way to generalise this to arbitrary levels of
unrolling by using instances of a typeclass on type level numerals.

Cheers,
Max
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: optimization and rewrite rules questions

2009-02-26 Thread Claus Reinke

| II is where I'd like to be able to distinguish variables, constants,
| and complex expressions in the left-hand sides of RULES, and
| I and III are where I'd like control over the rewrite strategy, as
| in strategy combinators.


I'm deep in icfp submissions, so no time to reply properly.


Okay, not urgent, just general concerns about GHC performance,
can wait a few days.

You can distinguish between literals, variables etc, in GHC's 
BuiltinRules.


Not really sufficient. If the loop body was about Maps instead
of Ints, different rules would apply. And the same limitations
apply for all uses of RULES, which were meant to provide
for user-extensible library-specific optimization by transformation.

Another avenue is to elaborate the language of rules somehow 
to let you say what you want. But I don't know a good *spec* 
for such a feature, let alone an impl.
Another possibility is to write a Core-to-Core optimiser aimed 
at your target area.  Max B's plugins would let you dynamically 
link that to a distributed GHC.  But it's not in the HEAD yet...


My guess would be to start from standard Strafunski/SYB-style
traversals over Core as the semantic API, utilizing compiler plugins.

Then look for better syntax, closer to the existing RULES (for a
start, something like quasi-quoting, so that left- and right-hand
sides of rules can still be written in concrete syntax, even if they
are just Haskell function definitions).

Then move all RULES to the new framework (to reduce
duplication inside GHC, and to provide a single programmer 
API) and start adding further optimizations to libraries (I wonder:

surely others have encountered these limitations when adding
RULES for their libraries?).

But that is just a guess - as you imply, someone would have to
sit down, design, implement, and test. That is why I asked 
whether it would make a good GSoC topic for GHC HQ

(lots of potential, and building on one of last year's topics).

Claus

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: optimization and rewrite rules questions

2009-02-26 Thread Claus Reinke

but if we unfold a loop combinator at compile time, GHC's
normal optimizations can take over from there):

http://www.haskell.org/pipermail/haskell-cafe/2009-February/056241.html


Just a note - there is a solution that doesn't require Template
Haskell which I use in my own code. Here is a sketch:


That is in fact the same solution!-) Just that I stayed close to the
example in the original thread, hence a fixpoint-combinator with
implicit tail-recursion and built-in counter rather than one with
explicit general recursion.


fact = fix4 fact_worker

{-# INLINE fact_worker #-}
fact_worker recurse n
 | n = 0 = 1
 | otherwise = n * recurse (n - 1)

{-# INLINE fix4 #-}
fix4 f = f1
 where
   f1 = f f2
   f2 = f f3
   f3 = f f4
   f4 = f f1

There is probably a way to generalise this to arbitrary levels of
unrolling by using instances of a typeclass on type level numerals.


Semantically, one could compute the nested application without
meta-level help, but that involves another recursive definition, which 
GHC won't unfold during compilation. So I used TH, just to generate 
the equivalent to the 'fixN' definition. Since only the fixpoint/loop 
combinators need to be unfolded statically, one could indeed do it 
by hand, for a suitable range of unfolding depths, and provide them

as a library.

Claus

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Type (class) recursion + families = exponential compile time?

2009-02-26 Thread Ben Franksen
Hi

the attached module is a much reduced version of some type-level assurance
stuff (inspired by the Lightweight Monadic Regions paper) I am trying to
do. I am almost certain that it could be reduced further but it is late and
I want to get this off my desk.

Note the 4 test functions, test11 .. test14. The following are timings for
compiling the module only with all test functions commented out, except
respectively, test11, test12, test13, and test14:

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  1,79s user 0,04s system 99% cpu 1,836 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  5,87s user 0,14s system 99% cpu 6,028 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  23,52s user 0,36s system 99% cpu 23,899 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  102,20s user 1,32s system 97% cpu 1:45,89 total

It seems something is scaling very badly. You really don't want to wait for
a version with 20 levels of nesting to compile...

If anyone has a good explanation for this, I'd be grateful.

BTW, I am not at all certain that this is ghc's fault, it may well be my
program, i.e. the constraints are too complex, whatever. I have no idea how
hard it is for the compiler to do all the unification. Also, the problem is
not of much practical relevance, as no sensible program will use more than
a handfull levels of nesting.

Cheers
Ben
{-# LANGUAGE Rank2Types, TypeSynonymInstances #-}
{-# LANGUAGE TypeFamilies, GeneralizedNewtypeDeriving #-}
module Bug2 where

import Control.Monad.Reader

newtype ResourceT r s m v = ResourceT { unResourceT :: ReaderT r m v }
  deriving (Monad)

data Ctx = Ctx

data Ch = Ch

type CAT s c = ResourceT [Ch] (s,c)

type CtxM c = ResourceT Ctx c IO

newtype CA s c v = CA { unCA :: CAT s c (CtxM c) v }
  deriving (Monad)

class (Monad m) = MonadCA m where
  type CtxLabel m

instance MonadCA (CA s c) where
  type CtxLabel (CA s c) = c

instance (Monad m, MonadCA m, c ~ CtxLabel m) = MonadCA  (CAT s c m) where
  type CtxLabel (CAT s c m) = c

runCAT :: (forall s. CAT s c m v) - m v
runCAT action = runReaderT (unResourceT action) []

newRgn :: MonadCA m = (forall s. CAT s (CtxLabel m) m v) - m v
newRgn = runCAT

runCA :: (forall s c. CA s c v) - IO v
runCA action = runCtxM (runCAT (unCA action))

runCtxM :: (forall c. CtxM c v) - IO v
runCtxM action = runReaderT (unResourceT action) Ctx

test11 :: IO ()
test11 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(
  newRgn(newRgn(newRgn(newRgn(return()))

-- test12 :: IO ()
-- test12 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(
--   newRgn(newRgn(newRgn(newRgn(return(

-- test13 :: IO ()
-- test13 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(
--   newRgn(newRgn(newRgn(newRgn(return()

-- test14 :: IO ()
-- test14 = runCA(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(newRgn(
--   newRgn(newRgn(newRgn(newRgn(return())

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Int vs Word performance?

2009-02-26 Thread Claus Reinke

Looking at prelude/PrelRules.hs has reminded me of an old
conundrum: if I switch from Int to Word, should I expect any
performance differences?

A while ago, I needed lots of fairly small positive numbers,
together with a small number of flags for each, so I thought
I'd switch from Int to Word, and map the flags to bits. But 
the performance dropped so drastically that I went back

to Int, slightly complicating the bitmaps. I didn't expect that,
and I can't see any builtin rules for Int that would have no
Word counterpart.

Here is a trivial example with drastic difference between
T = Int and T = Word (~2.5x here):

   main = print $ foldl' (+) 0 [1..1::T]

What am I missing here?

Also, in the real code I ended up seeing things like this in
the -ddump-simpl output for the bit-fiddling code:

   GHC.Prim.word2Int#
   (GHC.Prim.and#
   (GHC.Prim.int2Word# wild13_XbE)
   (GHC.Prim.int2Word# y#_a4EZ))

Is that likely to cost me a lot or are these conversions cheap?

Claus

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Type (class) recursion + families = exponential compile time?

2009-02-26 Thread Lennart Augustsson
Just Hindley-Milner type inference is exponential, making the type
system more complex isn't going to make things better.

2009/2/26 Ben Franksen ben.frank...@online.de:
 Hi

 the attached module is a much reduced version of some type-level assurance
 stuff (inspired by the Lightweight Monadic Regions paper) I am trying to
 do. I am almost certain that it could be reduced further but it is late and
 I want to get this off my desk.

 Note the 4 test functions, test11 .. test14. The following are timings for
 compiling the module only with all test functions commented out, except
 respectively, test11, test12, test13, and test14:

 b...@sarun[1]  time ghc -c Bug2.hs
 ghc -c Bug2.hs  1,79s user 0,04s system 99% cpu 1,836 total

 b...@sarun[1]  time ghc -c Bug2.hs
 ghc -c Bug2.hs  5,87s user 0,14s system 99% cpu 6,028 total

 b...@sarun[1]  time ghc -c Bug2.hs
 ghc -c Bug2.hs  23,52s user 0,36s system 99% cpu 23,899 total

 b...@sarun[1]  time ghc -c Bug2.hs
 ghc -c Bug2.hs  102,20s user 1,32s system 97% cpu 1:45,89 total

 It seems something is scaling very badly. You really don't want to wait for
 a version with 20 levels of nesting to compile...

 If anyone has a good explanation for this, I'd be grateful.

 BTW, I am not at all certain that this is ghc's fault, it may well be my
 program, i.e. the constraints are too complex, whatever. I have no idea how
 hard it is for the compiler to do all the unification. Also, the problem is
 not of much practical relevance, as no sensible program will use more than
 a handfull levels of nesting.

 Cheers
 Ben

 ___
 Glasgow-haskell-users mailing list
 Glasgow-haskell-users@haskell.org
 http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Int vs Word performance?

2009-02-26 Thread Don Stewart
claus.reinke:
 Looking at prelude/PrelRules.hs has reminded me of an old
 conundrum: if I switch from Int to Word, should I expect any
 performance differences?

 A while ago, I needed lots of fairly small positive numbers,
 together with a small number of flags for each, so I thought
 I'd switch from Int to Word, and map the flags to bits. But the 
 performance dropped so drastically that I went back
 to Int, slightly complicating the bitmaps. I didn't expect that,
 and I can't see any builtin rules for Int that would have no
 Word counterpart.

 Here is a trivial example with drastic difference between
 T = Int and T = Word (~2.5x here):

main = print $ foldl' (+) 0 [1..1::T]

 What am I missing here?

 Also, in the real code I ended up seeing things like this in
 the -ddump-simpl output for the bit-fiddling code:

GHC.Prim.word2Int#
(GHC.Prim.and#
(GHC.Prim.int2Word# wild13_XbE)
(GHC.Prim.int2Word# y#_a4EZ))

 Is that likely to cost me a lot or are these conversions cheap?

Those guys are no-ops, and in general you should never see a performance
difference. If you do, it is a bug!  There are some known cases where
rules are missing however:

  * Conversions to Int from Double for ceil, floor, round are missing rules.
http://hackage.haskell.org/trac/ghc/ticket/2271

  * gcd only has a specialised implementation for Int,
http://hackage.haskell.org/trac/ghc/ticket/2270

Some others I'm aware of are product/sum/maximum/minimum
on lists have specialisations for some atomic types (Int, Integer) but
not all (needs a ticket for this too).

I'm not aware of any remaining theoretically noop conversions that
aren't in fact implemented noops now. If you find them, please open a
ticket.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: Type (class) recursion + families = exponential compile time?

2009-02-26 Thread Ben Lippmeier


Here's the reference
http://portal.acm.org/citation.cfm?id=96748

Deciding ML typability is complete for deterministic exponential  
time -- Harry G. Mairson.


Ben.


On 27/02/2009, at 10:12 AM, Ben Franksen wrote:


Hi

the attached module is a much reduced version of some type-level  
assurance
stuff (inspired by the Lightweight Monadic Regions paper) I am  
trying to
do. I am almost certain that it could be reduced further but it is  
late and

I want to get this off my desk.

Note the 4 test functions, test11 .. test14. The following are  
timings for
compiling the module only with all test functions commented out,  
except

respectively, test11, test12, test13, and test14:

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  1,79s user 0,04s system 99% cpu 1,836 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  5,87s user 0,14s system 99% cpu 6,028 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  23,52s user 0,36s system 99% cpu 23,899 total

b...@sarun[1]  time ghc -c Bug2.hs
ghc -c Bug2.hs  102,20s user 1,32s system 97% cpu 1:45,89 total

It seems something is scaling very badly. You really don't want to  
wait for

a version with 20 levels of nesting to compile...

If anyone has a good explanation for this, I'd be grateful.

BTW, I am not at all certain that this is ghc's fault, it may well  
be my
program, i.e. the constraints are too complex, whatever. I have no  
idea how
hard it is for the compiler to do all the unification. Also, the  
problem is
not of much practical relevance, as no sensible program will use  
more than

a handfull levels of nesting.

Cheers
Ben
Bug2.hs___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Suggestion for bang patterns documentation

2009-02-26 Thread Brian Bloniarz

I got confused by the GHC documentation recently, I was wondering how
it could be improved. From:
http://www.haskell.org/ghc/docs/latest/html/users_guide/bang-patterns.html

 A bang only really has an effect if it precedes a variable or wild-card 
 pattern:
 f3 !(x,y) = [x,y]
 f4 (x,y)  = [x,y]
 Here, f3 and f4 are identical; putting a bang before a pattern that
 forces evaluation anyway does nothing.

The first sentence is true, but only in settings where the pattern is being
evaluated eagerly -- the bang in:
 f3 a = let !(x,y) = a in [1,x,y]
 f4 a = let (x,y) = a in [1,x,y]
has an effect.

The first time I read this, I took the first sentence to be a unqualified truth
and ended up thinking that !(x,y) was equivalent to (x,y) everywhere. Stuff
that comes later actually clarifies this, but I missed it.

What about making the distinction clear upfront? Something like:
 A bang in an eager pattern match only really has an effect if it precedes a 
 variable
 or wild-card pattern:
 f3 !(x,y) = [x,y]
 f4 (x,y)  = [x,y]
 Because f4 _|_ will force the evaluation of the pattern match anyway, f3 and 
 f4
 are identical; the bang does nothing.

It also might be a good idea to immediately follow this with the let/where 
usage:

 A bang can also preceed a let/where binding to make the pattern match strict. 
 For example:
 let ![x,y] = e in b
 is a strict pattern...
(in the existing docs, let comes a bit later):

Just a thought. Hopefully someone can come up with a better way of
wording what I'm getting at.

Thanks,
-Brian

_
Windows Live™ Hotmail®…more than just e-mail. 
http://windowslive.com/howitworks?ocid=TXT_TAGLM_WL_t2_hm_justgotbetter_howitworks_022009___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users