RE: newtypes and optimization

2007-12-13 Thread Simon Peyton-Jones
| However when I do this:
|
|  newtype Quaternion = Q (Vec4 Double)
|
| Everything is ruined. Functions like peek and vadd are no longer inlined,
| intermediate linked lists are created all over the place. The Quaternion
| Storable instance looks like this

Turns out this is a perf bug in 6.8 that I fixed a couple of weeks ago in the 
HEAD, but didn't merge.  (Implication constraints aren't getting INLINE 
pragmas.)

With the HEAD we get this, which should make you happy.  The HEAD allocates 
only 9kbytes in both -DSLOW and -DFAST, whereas 6.8 allocates 21kbytes in 
-DFAST (and off the map for -DSLOW).

I guess we should get this patch into the 6.8 branch.

Simon


$gpj --make -O2 -DFAST Test -o Test-fast
[1 of 2] Compiling VecMath  ( VecMath.hs, VecMath.o )
NOTE: Simplifier still going after 4 iterations; bailing out.  Size = 7311
[2 of 2] Compiling Main ( Test.hs, Test.o )
Linking Test-fast ...
bash-3.1$ rm -f *.o
bash-3.1$ $gpj --make -O2 -DSLOW Test -o Test-slow
[1 of 2] Compiling VecMath  ( VecMath.hs, VecMath.o )
NOTE: Simplifier still going after 4 iterations; bailing out.  Size = 7311
[2 of 2] Compiling Main ( Test.hs, Test.o )
Linking Test-slow ...
bash-3.1$ time ./Test-fast +RTS -sstderr
./Test-fast +RTS -sstderr
  9,432 bytes allocated in the heap
552 bytes copied during GC (scavenged)
  0 bytes copied during GC (not scavenged)
 32,768 bytes maximum residency (1 sample(s))

  1 collections in generation 0 (  0.00s)
  1 collections in generation 1 (  0.00s)

  1 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time5.88s  (  6.87s elapsed)
  GCtime0.00s  (  0.01s elapsed)
  EXIT  time0.00s  (  0.01s elapsed)
  Total time5.88s  (  6.88s elapsed)

  %GC time   0.0%  (0.1% elapsed)

  Alloc rate1,605 bytes per MUT second

  Productivity 100.0% of total user, 85.4% of total elapsed


real0m6.973s
user0m5.880s
sys 0m0.956s
bash-3.1$ time ./Test-slow +RTS -sstderr
./Test-slow +RTS -sstderr
  9,432 bytes allocated in the heap
552 bytes copied during GC (scavenged)
  0 bytes copied during GC (not scavenged)
 32,768 bytes maximum residency (1 sample(s))

  1 collections in generation 0 (  0.00s)
  1 collections in generation 1 (  0.00s)

  1 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time5.90s  (  6.83s elapsed)
  GCtime0.00s  (  0.03s elapsed)
  EXIT  time0.00s  (  0.03s elapsed)
  Total time5.90s  (  6.86s elapsed)

  %GC time   0.0%  (0.4% elapsed)

  Alloc rate1,597 bytes per MUT second

  Productivity 100.0% of total user, 86.0% of total elapsed


real0m6.958s
user0m5.904s
sys 0m1.004s
bash-3.1$

ghc --make -O2 -DFAST Test -o Test-682
[1 of 2] Compiling VecMath  ( VecMath.hs, VecMath.o )
[2 of 2] Compiling Main ( Test.hs, Test.o )
Linking Test-682 ...
bash-3.1$ time ./Test-682 +RTS -sstderr
./Test-682 +RTS -sstderr
 21,752 bytes allocated in the heap
552 bytes copied during GC (scavenged)
  0 bytes copied during GC (not scavenged)
 32,768 bytes maximum residency (1 sample(s))

  1 collections in generation 0 (  0.00s)
  1 collections in generation 1 (  0.00s)

  1 Mb total memory in use

  INIT  time0.00s  (  0.00s elapsed)
  MUT   time5.77s  (  6.69s elapsed)
  GCtime0.00s  (  0.00s elapsed)
  EXIT  time0.00s  (  0.00s elapsed)
  Total time5.77s  (  6.69s elapsed)

  %GC time   0.0%  (0.0% elapsed)

  Alloc rate3,770 bytes per MUT second

  Productivity 100.0% of total user, 86.2% of total elapsed


real0m6.787s
user0m5.768s
sys 0m1.016s
bash-3.1$
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


newtypes and optimization

2007-12-12 Thread Scott Dillard
Hi,

I have a statically-sized-list library that I use for linear algebra stuff.
It's got a vector type something like this:

 data V a b = V a b

so that a 3D vector is

 type Vec3 a = V a (V a (V a ()))

and I have type classes for operations on these things, like so:

 class VZipWith a b c u v w | {-lots of fundeps-} where
   vzipWith :: (a - b - c) - u - v - w

 instance VZipWith a b c (V a ()) (V b ()) (V c ()) where
   vzipWith f (V x ()) (V y ()) = V (f x y) ()

 instance
   VZipWith a b c (V a u) (V b v) (V c w)
   = VZipWith a b c (V a (V a u)) (V b (V b v)) (V c (V c w))
 where
   vzipWith f (V x u) (V y v) = V (f x y) (vzipWith f u v)

so that vector addition is

 vadd = vzipWith (+)

I put strictness annotations and INLINE pragmas all over the place, and GHC
does wonders with it. Using Storable instances something like the following,

 instance Storable a = Storable (V a ()) where
   sizeOf _ = sizeOf (undefined::a)
   peek p = peek (castPtr p) = \a - return (V a ())
   --etc

 instance (Storable a, Storable v) = Storable (V a v) where
   sizeOf _ = sizeOf (undefined::a) + sizeOf (undefined::v)
   peek p =
 a - peek (castPtr p)
 v - peek (castPtr (p`plusPtr`sizeOf(undefined::a)))
 return (V a v)

GHC can turn a loop like this,

 forM_ [0..n] $ \i -
   do a - peekElemOff aptr i
  b - peekElemOff bptr i
  pokeElemOff cptr i (vadd a b)

into something as fast as C, using no heap. You look at the core and its
nothing but readDoubleOffAddr#, +## and the like. I went so far as to generalize
this to matrices with things like vector-matrix and matrix-matrix
multiplication, determinants and all that and, when used in loops like above,
it's consistently as fast or even faster than C.


However when I do this:

 newtype Quaternion = Q (Vec4 Double)

Everything is ruined. Functions like peek and vadd are no longer inlined,
intermediate linked lists are created all over the place. The Quaternion
Storable instance looks like this

 instance Storable s = Storable (Quaternion s) where
   sizeOf _ = 4*sizeOf (undefined::s)
   peek p = peek (castPtr p :: Ptr (Vec4 s)) = \v - return (Q v)

with strictness annotations and INLINEs for everything. I also tried automatic
newtype deriving, with no luck. Why does a newtype defeat so much of the
optimization?

Thanks,
Scott
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: newtypes and optimization

2007-12-12 Thread Stefan O'Rear
On Wed, Dec 12, 2007 at 11:02:15AM -0700, Scott Dillard wrote:
 with strictness annotations and INLINEs for everything. I also tried automatic
 newtype deriving, with no luck. Why does a newtype defeat so much of the
 optimization?
 
 Thanks,
 Scott

(Not a GHC developer, but someone fairly familiar with how the Simons
work)

What version of GHC are you using?  The implementation of newtypes was
completely redone in the 6.7.x period.

Do you have a fairly small complete working example?  If so, link to or
attach a tarball - will make their jobs much easier.

Stefan


signature.asc
Description: Digital signature
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: newtypes and optimization

2007-12-12 Thread Don Stewart
stefanor:
 On Wed, Dec 12, 2007 at 11:02:15AM -0700, Scott Dillard wrote:
  with strictness annotations and INLINEs for everything. I also tried 
  automatic
  newtype deriving, with no luck. Why does a newtype defeat so much of the
  optimization?
  
  Thanks,
  Scott
 
 (Not a GHC developer, but someone fairly familiar with how the Simons
 work)
 
 What version of GHC are you using?  The implementation of newtypes was
 completely redone in the 6.7.x period.
 
 Do you have a fairly small complete working example?  If so, link to or
 attach a tarball - will make their jobs much easier.
 
 Stefan

Yeah, this sounds like maybe a bug, or maybe something wrong. We need to
investigate! An example please.

-- Don
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: newtypes and optimization

2007-12-12 Thread Scott Dillard
You can find an example at:

graphics.cs.ucdavis.edu/~sdillard/newtype-maybe-bug.tar.gz

Here's my session with it:

ghc --make -O2 -DFAST Test
time ./Test +RTS -tstderr

ghc: 21120 bytes, 2 GCs, 32768/32768 avg/max bytes residency (1
samples), 1M in use, 0.00 INIT (0.00 elapsed), 7.52 MUT (8.70
elapsed), 0.00 GC (0.00 elapsed) :ghc

real0m8.827s
user0m7.520s
sys 0m1.224s



ghc --make -O2 -DSLOW Test -no-recomp
time ./Test +RTS -tstderr

Ctrl-C

Test: interrupted
ghc: 13476104656 bytes, 25762 GCs, 36864/36864 avg/max bytes
residency (1 samples), 1M in use, 0.00 INIT (0.00 elapsed), 91.19 MUT
(92.85 elapsed), 0.22 GC (0.36 elapsed) :ghc

real1m33.232s
user1m31.410s
sys 0m1.020s


If you do

  ghc -c -O2 -DFAST -ddump-simpl Test.hs | grep VecMath.V

you won't find any occurrences/pattern matches on the constructor, but with

  ghc -c -O2 -DSLOW -ddump-simpl Test.hs | grep VecMath.V

you'll see lots, due to a call to peek that is not inlined.

This is with ghc-6.8.1 and ghc-6.8.2 (no difference)


Thanks for any light you can shed on this,
Scott




On Dec 12, 2007 1:48 PM, Don Stewart [EMAIL PROTECTED] wrote:
 stefanor:

  On Wed, Dec 12, 2007 at 11:02:15AM -0700, Scott Dillard wrote:
   with strictness annotations and INLINEs for everything. I also tried 
   automatic
   newtype deriving, with no luck. Why does a newtype defeat so much of the
   optimization?
  
   Thanks,
   Scott
 
  (Not a GHC developer, but someone fairly familiar with how the Simons
  work)
 
  What version of GHC are you using?  The implementation of newtypes was
  completely redone in the 6.7.x period.
 
  Do you have a fairly small complete working example?  If so, link to or
  attach a tarball - will make their jobs much easier.
 
  Stefan

 Yeah, this sounds like maybe a bug, or maybe something wrong. We need to
 investigate! An example please.

 -- Don

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users