Re: [Haskell-cafe] Re: GHC's parallel garbage collector -- what am I doing wrong?

2010-03-07 Thread Simon Marlow

On 07/03/10 14:41, Jan-Willem Maessen wrote:


On Mar 3, 2010, at 8:44 AM, Simon Marlow wrote:


On 01/03/2010 21:20, Michael Lesniak wrote:

Hello Bryan,


The parallel GC currently doesn't behave well with concurrent
programs that uses multiple capabilities (aka OS threads), and
the behaviour you see is the known symptom of this.. I believe
that Simon Marlow has some fixes in hand that may go into
6.12.2.


It's more correct to say the parallel GC has difficulty when one of
its threads is descheduled by the OS, because the other threads
just spin waiting for it.  Presumably some kernels are more
susceptible than others due to differences in scheduling policy, I
know they've been fiddling around with this a lot in Linux
recently.

You typically don't see a problem when there are spare cores, the
slowdown manifests when you are trying to use all the cores in your
machine, so it affects people on dual-cores quite a lot. This
probably explains why I've not been particularly affected by this
myself, since I do most of my benchmarking on an 8-core box.

The fix that will be in 6.12.2 is to insert some yields, so that
threads will yield rather than spinning indefinitely, and this
seems to help a lot.


Be warned that inserting yield into a spin loop is also non-portable,
and may make the problem *worse* on some systems.

The problem is that "yield" calls can be taken by the scheduler to
mean "See, I'm a nice thread, giving up the core when I don't need
it.  Please give me extra Scheduling Dubloons."

>

Now let's say 7 of your 8 threads are doing this.  It's likely that
each one will yield to the next, and the 8th thread (the one you
actually want on-processor) could take a long time to bubble up and
get its moment.  At one time on Solaris you could even livelock
(because the scheduler didn't try particularly hard to be fair in the
case of multiple yielding threads in a single process)---but that was
admittedly a long time ago.


How depressing, thanks for that :)


The only recourse I know about is to tell the OS you're doing
synchronization (by using OS-visible locking calls, say the ones in
pthreads or some of the lightweight calls that Linux has added for
the purpose).  Obviously this has a cost if anyone falls out of the
spin loop---and it's pretty likely some thread will have to wait a
while.


Yes, so we tried using futexes on Linux, there's an experimental patch 
attached to


http://hackage.haskell.org/trac/ghc/ticket/3553

it was definitely slower than the spinlocks on the benchmarks I tried.

I think the problem is that we're using these spinlocks to synchronise 
across all cores, and it's likely that these loops will have to spin for 
a while before exiting becuase one or more of the running cores takes a 
while to get to a safe point.  But really giving up the core and 
blocking is a lot worse, becuas the wakeup time is so long (you can see 
it pretty clearly in ThreadScope).


Anyway, I hope all this is just a temporary problem until we get 
CPU-independent GC working.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


Re: [Haskell-cafe] Re: GHC's parallel garbage collector -- what am I doing wrong?

2010-03-07 Thread Jan-Willem Maessen

On Mar 3, 2010, at 8:44 AM, Simon Marlow wrote:

> On 01/03/2010 21:20, Michael Lesniak wrote:
>> Hello Bryan,
>> 
>>> The parallel GC currently doesn't behave well with concurrent programs that
>>> uses multiple capabilities (aka OS threads), and the behaviour you see is
>>> the known symptom of this.. I believe that Simon Marlow has some fixes in
>>> hand that may go into 6.12.2.
> 
> It's more correct to say the parallel GC has difficulty when one of its 
> threads is descheduled by the OS, because the other threads just spin waiting 
> for it.  Presumably some kernels are more susceptible than others due to 
> differences in scheduling policy, I know they've been fiddling around with 
> this a lot in Linux recently.
> 
> You typically don't see a problem when there are spare cores, the slowdown 
> manifests when you are trying to use all the cores in your machine, so it 
> affects people on dual-cores quite a lot. This probably explains why I've not 
> been particularly affected by this myself, since I do most of my benchmarking 
> on an 8-core box.
> 
> The fix that will be in 6.12.2 is to insert some yields, so that threads will 
> yield rather than spinning indefinitely, and this seems to help a lot.

Be warned that inserting yield into a spin loop is also non-portable, and may 
make the problem *worse* on some systems.

The problem is that "yield" calls can be taken by the scheduler to mean "See, 
I'm a nice thread, giving up the core when I don't need it.  Please give me 
extra Scheduling Dubloons."

Now let's say 7 of your 8 threads are doing this.  It's likely that each one 
will yield to the next, and the 8th thread (the one you actually want 
on-processor) could take a long time to bubble up and get its moment.  At one 
time on Solaris you could even livelock (because the scheduler didn't try 
particularly hard to be fair in the case of multiple yielding threads in a 
single process)---but that was admittedly a long time ago.

The only recourse I know about is to tell the OS you're doing synchronization 
(by using OS-visible locking calls, say the ones in pthreads or some of the 
lightweight calls that Linux has added for the purpose).  Obviously this has a 
cost if anyone falls out of the spin loop---and it's pretty likely some thread 
will have to wait a while.

-Jan-Willem Maessen

> 
> Cheers,
>   Simon
> ___
> Haskell-Cafe mailing list
> Haskell-Cafe@haskell.org
> http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe


[Haskell-cafe] Re: GHC's parallel garbage collector -- what am I doing wrong?

2010-03-03 Thread Simon Marlow

On 01/03/2010 21:20, Michael Lesniak wrote:

Hello Bryan,


The parallel GC currently doesn't behave well with concurrent programs that
uses multiple capabilities (aka OS threads), and the behaviour you see is
the known symptom of this.. I believe that Simon Marlow has some fixes in
hand that may go into 6.12.2.


It's more correct to say the parallel GC has difficulty when one of its 
threads is descheduled by the OS, because the other threads just spin 
waiting for it.  Presumably some kernels are more susceptible than 
others due to differences in scheduling policy, I know they've been 
fiddling around with this a lot in Linux recently.


You typically don't see a problem when there are spare cores, the 
slowdown manifests when you are trying to use all the cores in your 
machine, so it affects people on dual-cores quite a lot. This probably 
explains why I've not been particularly affected by this myself, since I 
do most of my benchmarking on an 8-core box.


The fix that will be in 6.12.2 is to insert some yields, so that threads 
will yield rather than spinning indefinitely, and this seems to help a lot.


Cheers,
Simon
___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe