Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2012-01-10 Thread Mikolaj Konarski
As it turns out, I ran into a similar issue with a concurrent Gibbs sampling implmentation I've been working on. Increasing -H fixed the regression, as expected. I'd be happy to provide data if someone was interested. Yes, please. Even if it turns out not a ThreadScope issue, it's still a

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2012-01-09 Thread Mikolaj Konarski
Tom, thank you very much for the ThreadScope feedback. Anything new? Anybody? We are close to a new release, so that's the last call for bug reports before the release. Stay tuned. :) On Fri, Dec 16, 2011 at 11:34, Tom Thorne thomas.thorn...@gmail.com wrote: Hi, I can't remember if it was

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2012-01-09 Thread Ben Gamari
On Mon, 9 Jan 2012 18:22:57 +0100, Mikolaj Konarski mikolaj.konar...@gmail.com wrote: Tom, thank you very much for the ThreadScope feedback. Anything new? Anybody? We are close to a new release, so that's the last call for bug reports before the release. Stay tuned. :) As it turns out, I ran

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-12-16 Thread Mikolaj Konarski
On Mon, Oct 10, 2011 at 15:55, Tom Thorne thomas.thorn...@gmail.com wrote: Yes I will try to run threadscope on it, I tried it before and the event log output produced about 1.8GB, and then crashed. Hi Tom, I'm one of the TS/ghc-events hackers and I'd like to learn more, fix it or at least

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-12-16 Thread Tom Thorne
Hi, I can't remember if it was threadscope that crashed or the RTS, since I was also having segfaults in the RTS because of this bug, that is fixed in 7.2.2: http://hackage.haskell.org/trac/ghc/ticket/5552 I successfully used threadscope by running my code for fewer iterations to produce a

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-12 Thread Tom Thorne
Thanks, I would except that my code, whilst pure, uses hmatrix, and hmatrix uses lapack internally and so presumably calls FFI functions. As far as I know lapack ought to be thread safe, but potentially the way it interfaces with haskell in hmatrix isn't. I don't want to blame hmatrix since it is

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-12 Thread Tom Thorne
The speedup is around 6 times on a 12 core machine, which I think is pretty decent given that the parallelised section is only a part of my code. The nested parMaps were left over from a previous implementation, I have moved to using just the inner one, since the outer map doesn't divide the work

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Tom Thorne
Yes I will try to run threadscope on it, I tried it before and the event log output produced about 1.8GB, and then crashed. Is there any way to tell the RTS to perform GC less often? My code doesn't use too much memory and I'm using fairly hefty machines (e.g one with 48 cores and 128GB of RAM)

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Gregory Collins
On Mon, Oct 10, 2011 at 3:55 PM, Tom Thorne thomas.thorn...@gmail.com wrote: Yes I will try to run threadscope on it, I tried it before and the event log output produced about 1.8GB, and then crashed. Is there any way to tell the RTS to perform GC less often? My code doesn't use too much

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Tom Thorne
thanks! I just tried setting -A32M and this seems to fix the parallel GC problems, I now get a speedup with parallel GC on and performance is the same as passing -qg. I had tried -H before and it only made things worse, but -A seems to do the trick. I'm still having problems with segmentation

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Simon Marlow
On 08/10/2011 01:47, austin seipp wrote: It's GHC, and partly the OS scheduler in some sense. Oversaturating, i.e. using an -N option your number of logical cores (including hyperthreads) will slow down your program typically. This isn't uncommon, and is well known - GHC's lightweight threads

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Simon Marlow
On 10/10/2011 15:44, Tom Thorne wrote: thanks! I just tried setting -A32M and this seems to fix the parallel GC problems, I now get a speedup with parallel GC on and performance is the same as passing -qg. I had tried -H before and it only made things worse, but -A seems to do the trick. I'm

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-10 Thread Erik Hesselink
On Mon, Oct 10, 2011 at 16:44, Tom Thorne thomas.thorn...@gmail.com wrote: thanks! I just tried setting -A32M and this seems to fix the parallel GC problems, I now get a speedup with parallel GC on and performance is the same as passing -qg. I had tried -H before and it only made things worse,

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-09 Thread Thomas Schilling
It would be really useful to see the threadscope output for this. Apart from cache effects (which may well be significant at 12 cores), the usual problems with parallel GHC are synchronisation. When GHC wants to perform a parallel GC it needs to stop all Haskell threads. These are lightweight

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-07 Thread Tom Thorne
I have made a dummy program that seems to exhibit the same GC slowdown behavior, minus the segmentation faults. Compiling with -threaded and running with -N12 I get very bad performance (3x slower than -N1), running with -N12 -qg it runs approximately 3 times faster than -N1. I don't know if I

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-07 Thread Oliver Batchelor
I'm not sure if this is at all related, but if I run a small Repa program with more threads than I have cores/CPUs then it gets drastically slower, I have a dual core laptop - and -N2 makes my small program take approximately 0.6 of the time. Increasing to -N4 and we're running about 2x the time,

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-07 Thread Alexander Kjeldaas
I am guessing that it is slowdown caused by GC needing to co-ordinate with blocked threads. That requires lots of re-scheduling to happen in the kernel. This is a hard problem I think, but also increasingly important as virtualization becomes more important and the number of schedulable cores

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-07 Thread austin seipp
It's GHC, and partly the OS scheduler in some sense. Oversaturating, i.e. using an -N option your number of logical cores (including hyperthreads) will slow down your program typically. This isn't uncommon, and is well known - GHC's lightweight threads have an M:N threading model, but for good

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-06 Thread Tom Thorne
I'm trying to narrow it down so that I can submit a meaningful bug report, and it seems to be something to do with switching off parallel GC using -qg, whilst also passing -Nx. Are there any known issues with this that people are aware of? At the moment I am using the latest haskell platform

[Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Tom Thorne
I am having some strange performance issues when using SMP parallelism, that I think may be something to do with GC. Apologies for the large readouts below but I'm not familiar enough to know what is and isn't relevant! I have a pure function that is mapped over a list of around 10 values, and

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Ryan Newton
Hi Tom, I think debugging this sort of problem is exactly what we need to be doing (and making easier). Have you tried Duncan's newest version of Threadscope by the way? It looks like -- completely aside from the GC time -- this program is not scaling. The mutator time itself, disregarding GC,

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Ketil Malde
I don't know if this is relevant to your problems, but I'm currently struggling to get some performance out of a parallel - or rather, concurrent - program. Basically, the initial thread parses some data into an IntMap, and then multiple threads access this read-only to do the Real Work. Now,

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Edward Z. Yang
Ketil, For your particular problem, unevaluated thunks should be easy to check: dump a heap profile and look for a decreasing allocation of thunks. That being said, IntMap is spine strict, so that will all be evaluated, and if your threads are accessing disjoint keys there should be no

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Tom Thorne
Thanks for the reply, I haven't actually tried threadscope yet, I will have a look at that tomorrow at some point. I also had no idea you could use valgrind on haskell programs, so I will look into that as well. I think the program certainly does have problems scaling, since I made a very basic

Re: [Haskell-cafe] SMP parallelism increasing GC time dramatically

2011-10-05 Thread Johan Tibell
On Wed, Oct 5, 2011 at 2:37 PM, Tom Thorne thomas.thorn...@gmail.comwrote: The only problem is that now I am getting random occasional segmentation faults that I was not been getting before, and once got a message saying: Main: schedule: re-entered unsafely Perhaps a 'foreign import unsafe'