RE: behaviour of {-# NOINLINE #-} in where clauses

2006-08-14 Thread Simon Peyton-Jones
[Narrowing to ghc users only]

That's odd.  I tried it (the HEAD) and it worked fine.  Input and output
below

You'll notice that the binding for 'realloc' got floated into the branch
of the case (that's FloatIn), but it is never inlined.

| Even so, I kind of wish there were a stage between STG and CMM that
| showed the imperative model of STG with linear layout, control flow
and
| notes to indicate thunk/closure allocations. I expect most of my
problem
| is that I do not understand the STG evaluation model sufficiently well
| to see how it maps to basic blocks, jumps/calls etc.

Try -ddump-prep.  It's essentially STG with a bit less clutter.

Simon

=== Input =

{-# OPTIONS -fglasgow-exts #-}

module Foo7 where

data Thing = One Thing | The Bool

loop xs ys =
  case xs of
One thing - loop thing ys
The other - case realloc of
True - False
False - True


  where
{-# NOINLINE realloc #-}
realloc = case ys of
One thing - True
The other - False


=== Output =

Rec {
Foo7.loop :: Foo7.Thing - Foo7.Thing - GHC.Base.Bool
[GlobalId]
[Arity 2
 NoCafRefs
 Str: DmdType SS]
Foo7.loop =
  \ (xs_add :: Foo7.Thing) (ys_ade :: Foo7.Thing) -
case xs_add of wild_B1 {
  Foo7.One thing_adv - Foo7.loop thing_adv ys_ade;
  Foo7.The other_adz -
let {
  realloc_seH :: GHC.Base.Bool
  [Str: DmdType]
  realloc_seH =
case ys_ade of wild1_Xc {
  Foo7.One thing_adn - GHC.Base.True; Foo7.The other1_adr
- GHC.Base.False
}
} in 
  case realloc_seH of wild1_Xe {
GHC.Base.False - GHC.Base.True; GHC.Base.True -
GHC.Base.False
  }
}
end Rec }



| -Original Message-
| From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED]
| On Behalf Of Duncan Coutts
| Sent: 13 August 2006 23:34
| To: GHC-users list; GHC-bugs list
| Subject: behaviour of {-# NOINLINE #-} in where clauses
| 
| This doesn't have the effect I expected:
| 
| loop xs =
|   case blah of
| One thing - ... loop
| The other - ... realloc ...
| 
| 
|   where
| {-# NOINLINE realloc #-}
| realloc = do
|   something
|   loop ...
| 
| My intention here was that the loop would not contain the code for
| realloc and that it'd be done as a call at the cmm level. My intention
| is to take the slow and rarely taken realloc path out of the code for
| the fast path.
| 
| It seems the {-# NOINLINE realloc #-} pagma did not have the effect I
| intended. Looking at the -ddump-simpl and -ddump-cmm, the code for the
| realloc gets expanded in place in a branch of a case statement. In the
| cmm code we end up with just what I didn't want:
| 
| loop_info:
| if (offset != 4096) goto later;
| ...
| ... lots of realloc code taking up space
| ... in the instruction / trace cache
| ...
| later:
| .. do the fast bits, read a byte, write a byte
| jump loop_info;
| 
| 
| Not only does the slow path take up space but it's in the location
| favoured by the hardware's static branch prediction.
| 
| Reversing the test doesn't help because either way ghc turns it into:
| 
| case thing of
|   _DEFAULT -
|   4096 -
| 
| and from that generates CMM:
| 
| if (thing != 4096) goto much_later;
| ...
| much_later:
| ...
| 
| 
| The reason I was looking at this is because I've been trying to figure
| out why our lazy byte string fusion primitives are much slower than
the
| strict versions. It's improving though, it's now only half the speed
| rather than a tenth of the speed. :-)
| 
| The ByteString.Lazy code is an interesting mixture of strict and lazy.
| We must strictly read/write the chunks but lazily generate/consume the
| list of chunks.
| 
| I just discovered that I should have been reading STG all along rather
| than core from the simplifier or CMM. STG takes out all the type
| annotations which tend to make things quite verbose. Mind you, seeing
| the types can be handy too to see if/how things are unboxed.
| 
| Even so, I kind of wish there were a stage between STG and CMM that
| showed the imperative model of STG with linear layout, control flow
and
| notes to indicate thunk/closure allocations. I expect most of my
problem
| is that I do not understand the STG evaluation model sufficiently well
| to see how it maps to basic blocks, jumps/calls etc.
| 
| 
| Duncan
| 
| ___
| Glasgow-haskell-bugs mailing list
| Glasgow-haskell-bugs@haskell.org
| http://www.haskell.org/mailman/listinfo/glasgow-haskell-bugs
___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: [Haskell] Haskell Workshop 2006 Call for participation

2006-08-14 Thread Bulat Ziganshin
Monday, August 14, 2006, 10:34:22 AM, Andres Loeh wrote:

14:30 Simon Marlow (Microsoft Research)
  An Extensible Dynamically-Typed Hierarchy of Exceptions

is this planned to be included in ghc 6.6? 6.8?

15:00 David Himmelstrup (Denmark)
  Demo: Interactive Debugging with GHCi

and this?





-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re: behaviour of {-# NOINLINE #-} in where clauses

2006-08-14 Thread Bulat Ziganshin
Hello Duncan,

Monday, August 14, 2006, 2:33:31 AM, you wrote:

 Reversing the test doesn't help because either way ghc turns it into:

 case thing of
   _DEFAULT -
   4096 -

to be exact, ghc passes code for default case separately from code for
other cases. look at emitSwitch procedure to know why it's required

 and from that generates CMM:

 if (thing != 4096) goto much_later;
 ...
 much_later:
 ...

can't you use 'if' expressions?

 Even so, I kind of wish there were a stage between STG and CMM that
 showed the imperative model of STG with linear layout, control flow and
 notes to indicate thunk/closure allocations. I expect most of my problem
 is that I do not understand the STG evaluation model sufficiently well
 to see how it maps to basic blocks, jumps/calls etc.

i once tried to understood STG-CMM code generation. it's all in
codeGen directory of ghc sources. and now i think that STG is pretty
low-level imperative language, not harder than C for example. i even
had the idea to write STG-C translator that generates efficient code.
on the other side, jhc already contains pretty the same thing (John
claims that jhc's internal language is close to STG)


-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


Re[8]: Replacement for GMP: Update

2006-08-14 Thread Bulat Ziganshin
Hello skaller,

Sunday, August 13, 2006, 4:34:14 AM, you wrote:

 I know very little about Haskell, let alone GHC internals

me too. so it's better to wait for comments about your thoughts from
GHC team than from me. but at least i can said that

 But the state of the art is then two stages behind the
 requirement: Haskell still has to 'world stop' threads
 to do a major collection.

is not exactly true. look at Non-stop Haskell
(http://www.haskell.org/~simonmar/papers/nonstop.pdf)

i don't know why it is not included in 6.6 or previous version

 So I'm bringing into question whether these nice
 'optimisations' are actually worthwhile. They actually
 seem to *degrade* performance, not improve it, when we're
 running with a large number of CPUs. Stopping the world
 if you have 20,000 CPU's will happen so often, all the
 CPU's will be idle 99.99% of the time :)

btw, one GHC intern worked on multi-processor GC and i hope that it
will be included in 6.6. so, the GC will also use all these 20k cpus :)
or Intel/IBM/Sun will start make some FP chips. they already do this
actually, just these chips still emulate x86/sparc/... ones :)




-- 
Best regards,
 Bulatmailto:[EMAIL PROTECTED]

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users


RE: behaviour of {-# NOINLINE #-} in where clauses

2006-08-14 Thread Duncan Coutts
On Mon, 2006-08-14 at 10:12 +0100, Simon Peyton-Jones wrote:
 [Narrowing to ghc users only]
 
 That's odd.  I tried it (the HEAD) and it worked fine.  Input and output
 below
 
 You'll notice that the binding for 'realloc' got floated into the branch
 of the case (that's FloatIn), but it is never inlined.

Ok, I'll try with the latest head.

In my code the realloc name has disappeared in the -ddump-simpl and the
only remaining 'let' expressions are for other things (for allocating
prim byte arrays and ForeignPtrContents).

 | Even so, I kind of wish there were a stage between STG and CMM that
 | showed the imperative model of STG with linear layout, control flow
 and
 | notes to indicate thunk/closure allocations. I expect most of my
 problem
 | is that I do not understand the STG evaluation model sufficiently well
 | to see how it maps to basic blocks, jumps/calls etc.
 
 Try -ddump-prep.  It's essentially STG with a bit less clutter.

Right'o. Thanks.

 === Input =
 
 {-# OPTIONS -fglasgow-exts #-}
 
 module Foo7 where
 
 data Thing = One Thing | The Bool
 
 loop xs ys =
   case xs of
 One thing - loop thing ys
 The other - case realloc of
   True - False
   False - True
 
 
   where
 {-# NOINLINE realloc #-}
 realloc = case ys of
   One thing - True
   The other - False
 
 
 === Output =
 
 Rec {
 Foo7.loop :: Foo7.Thing - Foo7.Thing - GHC.Base.Bool
 [GlobalId]
 [Arity 2
  NoCafRefs
  Str: DmdType SS]
 Foo7.loop =
   \ (xs_add :: Foo7.Thing) (ys_ade :: Foo7.Thing) -
 case xs_add of wild_B1 {
   Foo7.One thing_adv - Foo7.loop thing_adv ys_ade;
   Foo7.The other_adz -
   let {
 realloc_seH :: GHC.Base.Bool
 [Str: DmdType]
 realloc_seH =
   case ys_ade of wild1_Xc {
 Foo7.One thing_adn - GHC.Base.True; Foo7.The other1_adr
 - GHC.Base.False
   }
   } in 
 case realloc_seH of wild1_Xe {
   GHC.Base.False - GHC.Base.True; GHC.Base.True -
 GHC.Base.False
 }
 }
 end Rec }

Yes, that looks more like what I want, I think.

Duncan

___
Glasgow-haskell-users mailing list
Glasgow-haskell-users@haskell.org
http://www.haskell.org/mailman/listinfo/glasgow-haskell-users