Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-08 Thread Curt Sampson
On 2010-03-06 12:42 + (Sat), Simon Marlow wrote:

 Usually I find keeping the nursery size (-A) close to the L2 cache size  
 works best, although sometimes making it really big can be even better.

Interesting to know. I got the impression that I was being encouraged to
keep -A closer to the L1 cache size, myself.

 -qg disables parallel GC completely.  This is usually terrible for  
 locality, because every GC will move all the recently allocated data  
 from each CPU's L2 cache into the cache of the CPU doing GC, where it  
 will have to be fetched out again after GC.

I've since explained to Cranshaw (we are getting to have *way* too many
'Simon's around here) about the issues with our different machines; some
of this depends on the host on which we're doing the testing.

* Our Core i7 hosts share 8 MB of L3 cache amongst four cores with
two threads each. Thus, no locality penalties here.

* Our Xeon E5420 host has two 4-core CPUs, and each pair of cores
shares a 6 MB L2 cache. Thus there's a pretty good chance that
something you need is in someone else's cache. I don't know if
there's any difference between moving stuff between two caches on
the same CPU and two caches on different CPUs.

* Our Xeon E5520 host has two 4-core CPUs, each core of which has
two threads. Each CPU (4 cores) shares an 8 MB L3 cache. Thus,
presumably, less locality penalty than the E5420 but more than an
i7. As a side note, I also see slightly less memory bandwidth on
this system (for both caches and main memory) than I do on an i7.

This gets complex pretty fast. And don't ask me about Intel's new style
of having L1 and L3 or L2 and L3 caches rather than L1 and L2 caches.

 -qb disables load-balancing in the parallel GC, which improves locality  
 at the expense of parallelism, usually I find it is an improvement in  
 parallel programs.

I'd think so too. Figuring out what went on here is going to have to
wait until I get more detailed GC information in the eventlog.

Followups to glasgow-haskell-us...@haskell.org.

cjs
-- 
Curt Sampson c...@cynic.net +81 90 7737 2974
 http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.--George Bernard Shaw
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-04 Thread Curt Sampson
On 2010-03-02 14:17 + (Tue), Simon Marlow wrote:

 System.Mem.performGC is your friend, but if you're unlucky it might do a  
 major GC and then you'll get more pause than you bargained for.

Anybody calling that is a really poor unlucky sod, because, as far as I
can tell from reading the sources, System.Mem.performGC always does a
major GC.

cjs
-- 
Curt Sampson c...@cynic.net +81 90 7737 2974
 http://www.starling-software.com
The power of accurate observation is commonly called cynicism
by those who have not got it.--George Bernard Shaw
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-04 Thread Peter Verswyvelen
A fully concurrent GC running on multiple threads/cores might be
great, but I guess this is difficult to implement and introduces a lot
of overhead.

For simple video games, it might work to always do a full GC per
frame, but don't allow it to take more than T milliseconds. In a sense
the GC function should be able to be paused and resumed, but it could
run on the same thread as the game loop, so no synchronization
overhead would arise (in a sense many video games are already little
cooperative multitasking systems). The game loop should then decide
how to balance the time given to the GC and the memory being
collected. This would cause longer frame times and hence sometimes a
decrease in frame rate, but never long stalls.

Note that the issue also popped up for me in C many years ago. Using
Watcom C/C++ in the nineties, I found out that a call to the free
function could take a very long time. Also for games that could run
many hours or days (e.g. a multi player game server) the C heap could
get very fragmented, causing memory allocations and deallocations to
take ever more time, sometimes even  fail. To make my games run
smoothly I had to implement my own memory manager. To make them run
for a very long time, I had to implement a memory defragmenter. So in
the end, this means a garbage collector :-) Of course this problem is
well known, I just wanted to re-state that for games the typical C
heap is not very well suitable either.
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-04 Thread wren ng thornton

Simon Marlow wrote:

So it would be pretty easy to provide something like

  disableMajorGC, enableMajorGC :: IO ()

Of course leaving it disabled too long could be bad, but that's your 
responsibility.


It seems like it'd be preferable to have an interface like:

withMajorGCDisabled :: IO() - IO()

or (for some definition of K):

withMajorGCDisabled :: (K - IO()) - IO()

in order to ensure that it always gets turned back on eventually. Of 
course, the latter can be created from the former pair. It's just that 
the former reminds me a bit much of explicit memory management and how 
difficult it is to balance the free()s...


--
Live well,
~wren
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-03 Thread Neil Davies


On 2 Mar 2010, at 21:38, Simon Marlow wrote:


On 02/03/10 20:37, Luke Palmer wrote:
On Tue, Mar 2, 2010 at 7:17 AM, Simon Marlowmarlo...@gmail.com   
wrote:

For games,
though, we have a very good point that occurs regularly where we  
know
that all/most short-lived objects will no longer be referenced -  
at the

start of a fresh frame.


System.Mem.performGC is your friend, but if you're unlucky it  
might do a

major GC and then you'll get more pause than you bargained for.


Some fine-grained control might be nice here.  Eg. I could do a major
GC as a player is opening a menu, on a loading screen, when the game
is paused, or some other key points, and it might still be annoying,
but at least it wouldn't interfere with gameplay.  There is of course
the question of what happens if one of these key points doesn't  
happen

when we need to do an allocation, but... oh well.  Perhaps that could
be mitigated by saying I would rather you allocate than major GC
right now.  Are any of these options impossible, or be unreasonably
difficult to implement (I don't suspect so)?


Actually that's one thing we can do relatively easily, i.e. defer  
major GC for a while.  Due to the way GHC has a two-layer memory  
manager, the heap is a list of discontiguous blocks, so we can  
always allocate some more memory.


So it would be pretty easy to provide something like

 disableMajorGC, enableMajorGC :: IO ()

Of course leaving it disabled too long could be bad, but that's your  
responsibility.


Oh, I just checked and System.Mem.performGC actually performs a  
major GC, here's its implementation:


foreign import ccall {-safe-} performMajorGC performGC :: IO ()

to perform a minor GC (or possibly a major GC if one is due), you  
want this:


foreign import ccall {-safe-} performGC performMinorGC :: IO ()

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Is there a similar set of runes to be able to see how much mutation  
has occurred, how much was live last GC, etc?


Neil

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-03 Thread Simon Marlow

On 03/03/2010 08:43, Neil Davies wrote:


On 2 Mar 2010, at 21:38, Simon Marlow wrote:


On 02/03/10 20:37, Luke Palmer wrote:

On Tue, Mar 2, 2010 at 7:17 AM, Simon Marlowmarlo...@gmail.com wrote:

For games,
though, we have a very good point that occurs regularly where we know
that all/most short-lived objects will no longer be referenced - at
the
start of a fresh frame.


System.Mem.performGC is your friend, but if you're unlucky it might
do a
major GC and then you'll get more pause than you bargained for.


Some fine-grained control might be nice here. Eg. I could do a major
GC as a player is opening a menu, on a loading screen, when the game
is paused, or some other key points, and it might still be annoying,
but at least it wouldn't interfere with gameplay. There is of course
the question of what happens if one of these key points doesn't happen
when we need to do an allocation, but... oh well. Perhaps that could
be mitigated by saying I would rather you allocate than major GC
right now. Are any of these options impossible, or be unreasonably
difficult to implement (I don't suspect so)?


Actually that's one thing we can do relatively easily, i.e. defer
major GC for a while. Due to the way GHC has a two-layer memory
manager, the heap is a list of discontiguous blocks, so we can always
allocate some more memory.

So it would be pretty easy to provide something like

disableMajorGC, enableMajorGC :: IO ()

Of course leaving it disabled too long could be bad, but that's your
responsibility.

Oh, I just checked and System.Mem.performGC actually performs a major
GC, here's its implementation:

foreign import ccall {-safe-} performMajorGC performGC :: IO ()

to perform a minor GC (or possibly a major GC if one is due), you want
this:

foreign import ccall {-safe-} performGC performMinorGC :: IO ()

Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Is there a similar set of runes to be able to see how much mutation has
occurred, how much was live last GC, etc?


No, but there could be - the information is collected by the RTS, it 
just isn't made available via a public API right now.


Cheers,
Simon

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-03-02 Thread Malcolm Wallace
Both concurrent GC and incremental GC tend to add overheads to the  
mutator, because they need a read barrier.  There was an incremental  
GC for GHC once [1], taking advantage of the built-in read barrier  
that we have whereby most closures are entered


Was there a specific reason why that GC implementation chose to use a  
read barrier rather than a write barrier?  I would have thought that  
in general, a write barrier is cheaper to implement.  Doesn't ghc  
update fewer thunks than it enters?


Regards,
Malcolm




___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: Real-time garbage collection for Haskell

2010-02-28 Thread Derek Elkins
On Sun, Feb 28, 2010 at 10:03 AM, Heinrich Apfelmus
apfel...@quantentunnel.de wrote:
 Luke Palmer wrote:
 I have seen some proposals around here for SoC projects and other
 things to try to improve the latency of GHC's garbage collector.  I'm
 currently developing a game in Haskell, and even 100ms pauses are
 unacceptable for a real-time game.  I'm calling out to people who have
 seen or made such proposals, because I would be willing to contribute
 funding and/or mentor a project that would contribute to this goal.

 Also any ideas for reducing this latency in other ways would be very
 appreciated.

 Overly long garbage collection might also be a sign of space leaks.

 But there are many other things that can go wrong in a real time system
 and might explain your delays. For example, you might need to avoid
 amortized time data structures like  Data.Sequence . Or for physics
 simulations, you'd need to fix the time step ∆t, as described in

   http://gafferongames.com/game-physics/fix-your-timestep/

 or numerical integration will deteriorate rather quickly.

Incidentally, what's described there is a simplified version of the
frequency locked loops described in Massalin's thesis on Synthesis OS,
and it is used there for about the same purpose in a soft real-time
context.

http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.29.4871
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe