As it turns out, I ran into a similar issue with a concurrent Gibbs
sampling implmentation I've been working on. Increasing -H fixed the
regression, as expected. I'd be happy to provide data if someone was
interested.
Yes, please. Even if it turns out not a ThreadScope issue, it's still
a
Tom, thank you very much for the ThreadScope feedback.
Anything new? Anybody? We are close to a new release,
so that's the last call for bug reports before the release.
Stay tuned. :)
On Fri, Dec 16, 2011 at 11:34, Tom Thorne thomas.thorn...@gmail.com wrote:
Hi,
I can't remember if it was
On Mon, 9 Jan 2012 18:22:57 +0100, Mikolaj Konarski
mikolaj.konar...@gmail.com wrote:
Tom, thank you very much for the ThreadScope feedback.
Anything new? Anybody? We are close to a new release,
so that's the last call for bug reports before the release.
Stay tuned. :)
As it turns out, I ran
On Mon, Oct 10, 2011 at 15:55, Tom Thorne thomas.thorn...@gmail.com wrote:
Yes I will try to run threadscope on it, I tried it before and the event log
output produced about 1.8GB, and then crashed.
Hi Tom,
I'm one of the TS/ghc-events hackers and I'd like to learn more,
fix it or at least
Hi,
I can't remember if it was threadscope that crashed or the RTS, since I was
also having segfaults in the RTS because of this bug, that is fixed in 7.2.2:
http://hackage.haskell.org/trac/ghc/ticket/5552
I successfully used threadscope by running my code for fewer iterations to
produce a
Thanks, I would except that my code, whilst pure, uses hmatrix, and hmatrix
uses lapack internally and so presumably calls FFI functions. As far as I
know lapack ought to be thread safe, but potentially the way it interfaces
with haskell in hmatrix isn't. I don't want to blame hmatrix since it is
The speedup is around 6 times on a 12 core machine, which I think is pretty
decent given that the parallelised section is only a part of my code. The
nested parMaps were left over from a previous implementation, I have moved
to using just the inner one, since the outer map doesn't divide the work
Yes I will try to run threadscope on it, I tried it before and the event log
output produced about 1.8GB, and then crashed.
Is there any way to tell the RTS to perform GC less often? My code doesn't
use too much memory and I'm using fairly hefty machines (e.g one with 48
cores and 128GB of RAM)
On Mon, Oct 10, 2011 at 3:55 PM, Tom Thorne thomas.thorn...@gmail.com wrote:
Yes I will try to run threadscope on it, I tried it before and the event log
output produced about 1.8GB, and then crashed.
Is there any way to tell the RTS to perform GC less often? My code doesn't
use too much
thanks! I just tried setting -A32M and this seems to fix the parallel GC
problems, I now get a speedup with parallel GC on and performance is the
same as passing -qg. I had tried -H before and it only made things worse,
but -A seems to do the trick.
I'm still having problems with segmentation
On 08/10/2011 01:47, austin seipp wrote:
It's GHC, and partly the OS scheduler in some sense. Oversaturating,
i.e. using an -N option your number of logical cores (including
hyperthreads) will slow down your program typically. This isn't
uncommon, and is well known - GHC's lightweight threads
On 10/10/2011 15:44, Tom Thorne wrote:
thanks! I just tried setting -A32M and this seems to fix the parallel GC
problems, I now get a speedup with parallel GC on and performance is the
same as passing -qg. I had tried -H before and it only made things
worse, but -A seems to do the trick.
I'm
On Mon, Oct 10, 2011 at 16:44, Tom Thorne thomas.thorn...@gmail.com wrote:
thanks! I just tried setting -A32M and this seems to fix the parallel GC
problems, I now get a speedup with parallel GC on and performance is the
same as passing -qg. I had tried -H before and it only made things worse,
It would be really useful to see the threadscope output for this.
Apart from cache effects (which may well be significant at 12 cores),
the usual problems with parallel GHC are synchronisation.
When GHC wants to perform a parallel GC it needs to stop all Haskell
threads. These are lightweight
I have made a dummy program that seems to exhibit the same GC
slowdown behavior, minus the segmentation faults. Compiling with -threaded
and running with -N12 I get very bad performance (3x slower than -N1),
running with -N12 -qg it runs approximately 3 times faster than -N1. I don't
know if I
I'm not sure if this is at all related, but if I run a small Repa program
with more threads than I have cores/CPUs then it gets drastically slower, I
have a dual core laptop - and -N2 makes my small program take approximately
0.6 of the time. Increasing to -N4 and we're running about 2x the time,
I am guessing that it is slowdown caused by GC needing to co-ordinate with
blocked threads. That requires lots of re-scheduling to happen in the
kernel.
This is a hard problem I think, but also increasingly important as
virtualization becomes more important and the number of schedulable cores
It's GHC, and partly the OS scheduler in some sense. Oversaturating,
i.e. using an -N option your number of logical cores (including
hyperthreads) will slow down your program typically. This isn't
uncommon, and is well known - GHC's lightweight threads have an M:N
threading model, but for good
I'm trying to narrow it down so that I can submit a meaningful bug report,
and it seems to be something to do with switching off parallel GC using -qg,
whilst also passing -Nx.
Are there any known issues with this that people are aware of? At the moment
I am using the latest haskell platform
I am having some strange performance issues when using SMP parallelism, that
I think may be something to do with GC. Apologies for the large readouts
below but I'm not familiar enough to know what is and isn't relevant!
I have a pure function that is mapped over a list of around 10 values, and
Hi Tom,
I think debugging this sort of problem is exactly what we need to be doing
(and making easier). Have you tried Duncan's newest version of Threadscope
by the way?
It looks like -- completely aside from the GC time -- this program is not
scaling. The mutator time itself, disregarding GC,
I don't know if this is relevant to your problems, but I'm currently
struggling to get some performance out of a parallel - or rather,
concurrent - program.
Basically, the initial thread parses some data into an IntMap, and then
multiple threads access this read-only to do the Real Work.
Now,
Ketil,
For your particular problem, unevaluated thunks should be easy
to check: dump a heap profile and look for a decreasing allocation
of thunks.
That being said, IntMap is spine strict, so that will all be evaluated,
and if your threads are accessing disjoint keys there should be no
Thanks for the reply, I haven't actually tried threadscope yet, I will have
a look at that tomorrow at some point. I also had no idea you could use
valgrind on haskell programs, so I will look into that as well.
I think the program certainly does have problems scaling, since I made a
very basic
On Wed, Oct 5, 2011 at 2:37 PM, Tom Thorne thomas.thorn...@gmail.comwrote:
The only problem is that now I am getting random occasional segmentation
faults that I was not been getting before, and once got a message saying:
Main: schedule: re-entered unsafely
Perhaps a 'foreign import unsafe'
25 matches
Mail list logo