subject:"\[Haskell\-cafe\] debugging memory corruption"

Re: [Haskell-cafe] debugging memory corruption

2012-12-02 Thread Evan Laforge

Thanks for the response.

On Sat, Dec 1, 2012 at 5:23 PM, Alexander Kjeldaas
alexander.kjeld...@gmail.com wrote:

 What I've mostly done in similar circumstances (jni)

 1. Create an interface (virtual functions or template) for the FFI in C++
 that covers everything you use. Then create one test implementation and one
 real implementation. The test implementation must allocate resources
 whenever the real FFI does so. Doing memory allocation works. This makes it
 possible to test all your FFI in C++ using valgrind.

If I understand correctly, this sounds like what I was talking about,
i.e. to stub out the C++ side and drive that from haskell to try to
repro.  That way I don't have to have windows popping up and do the
simulation at the level of mouse clicks.  The danger is that it turns
out to be lots of work to implement, but still somehow doesn't
reproduce the problem.  That could happen if the bug is in C++, but
only turns up during manual manipulation.

Or maybe you're talking about the other way around, stub out the
haskell and replace it with C++ and then run that in valgrind?  That
seems unlikely to be helpful, because if the bug is in the haskell FFI
code then rewriting that all in C++ is just going to replace it with
possibly also buggy C++ code.

It seems to me like valgrind just plain doesn't work for haskell,
maybe because the ghc runtime uses its own allocator?  So if the bug
is in haskell I can't find it with valgrind.  If the bug is in C++,
well, I already have a pure C++ version (that talks to the C++
interface in a very simplistic way), and it can run under valgrind,
which doesn't turn up any out of bounds errors.

 2. Add tracing support to the real implementation and replay support to the
 test implementation.

I'm not sure this would work, since the whole thing is that the bug is
nondeterministic.  I feel like the only way to get it to come out is
to do a bunch of random stuff for a period of time.  It's likely that
whether it happens or not depends on the memory layout for that
particular run, and as far as I know you can't make that consistent.
Or can you?

 3. Upload to Hackage.

Is the suggestion that people who love debugging hard problems will
swarm out of the woodwork and help me find the problem?  I should be
so lucky :)

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

[Haskell-cafe] debugging memory corruption

2012-12-01 Thread Evan Laforge

Ever since upgrading to 7.6.1 I regularly get panics like this:

seq: internal error: evacuate: strange closure type -1958168540
(GHC version 7.6.1 for x86_64_apple_darwin)
Please report this as a GHC bug:  http://www.haskell.org/ghc/reportabug

I've seen some variations, but basically I think it just means someone
is corrupting memory and it goes unnoticed until the GC trips over it.

This happens infrequently (maybe once in 15m, very roughly, it's
extremely inconsistent) in non-optimized code, and frequently (maybe
once in 5m) in optimized code.  This only happens during interactive
use, not during testing or profiling.

I had a problem like this once before, and it took a very long time to
track down.  And in fact, I never really tracked it down, I just got a
test that could semi-reliably reproduce it, and then by trial and
error discovered that if I changed the alignment of a particular
Storable instance from 1 to 4, the problem stopped happening (and 1
should have been correct, it was a struct of chars).  Not exactly a
satisfying solution, and now I'm thinking all I did was make ghc 6
stop manifesting it, and with 7 it's back again.

I'm most suspicious of the FFI usage since it's easy to corrupt memory
in C++ and even easier to write a bogus Storable instance that does
the same, but I really have no idea what or where.  My first thought
was to start cutting out portions (especially FFI-using portions) to
try to isolate it, but it's slow going because it can sometimes take
quite a while to the bug to come out again.  My second thought was
that I need a more automated way to reproduce it, but it's nontrivial
because it only comes up when I'm using the interactive GUI parts,
which are also incidentally a big chunk of FFI.  And even if I do get
a repro, as I did before, it just means I can more quickly poke
randomly at things hoping they change it, but even if I can get it to
stop happening it doesn't mean I understood it, or even really fixed
it.  This is also the kind of bug (well, it was last time), which is
highly dependent on the code, so add one print and it stops happening.
 I have a sort of complicated scheme where I pass a wrapped haskell
function callback along with a wrapped freeHaskellFunPtr to free the
last one, along with itself, maybe it's something to do with that.

Anyone out there with ideas or advice on how to track down this kind
of bug?  My next thought is to try to automate the GUI parts, or maybe
just the FFI part, so I can write a program to randomly fuss with it
until it crashes.  I've also tried valgrind, but it doesn't report
anything suspicious.  But it also doesn't seem to work on FFI Storable
corruption, I've tried intentionally inserting a bad poke and valgrind
still won't report it.

Thanks in advance for any insight!


Actually, there's a whole other discussion which has been nagging at
me for a while, though another thread would be more appropriate.  But
in short it's that it feels like hsc2hs is just too low level, and too
error-prone.  It's tempting to use because it comes with ghc, but it
seems bad to tell people haskell is a safe language, but as soon as
you want to talk to C you're writing totally unchecked pokes and
peeks.  Maybe I should go evaluate the alternatives like c2hs, or
maybe safety features can added to hsc2hs.  Wouldn't it be nice to
have ghc come with a high level and safe FFI language?

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] debugging memory corruption

2012-12-01 Thread Alexander Kjeldaas

What I've mostly done in similar circumstances (jni)

1. Create an interface (virtual functions or template) for the FFI in C++
that covers everything you use. Then create one test implementation and one
real implementation. The test implementation must allocate resources
whenever the real FFI does so. Doing memory allocation works. This makes it
possible to test all your FFI in C++ using valgrind.

2. Add tracing support to the real implementation and replay support to the
test implementation.

3. Upload to Hackage.

Alexander
On Dec 1, 2012 5:06 PM, Evan Laforge qdun...@gmail.com wrote:

 Ever since upgrading to 7.6.1 I regularly get panics like this:

 seq: internal error: evacuate: strange closure type -1958168540
 (GHC version 7.6.1 for x86_64_apple_darwin)
 Please report this as a GHC bug:
 http://www.haskell.org/ghc/reportabug

 I've seen some variations, but basically I think it just means someone
 is corrupting memory and it goes unnoticed until the GC trips over it.

 This happens infrequently (maybe once in 15m, very roughly, it's
 extremely inconsistent) in non-optimized code, and frequently (maybe
 once in 5m) in optimized code.  This only happens during interactive
 use, not during testing or profiling.

 I had a problem like this once before, and it took a very long time to
 track down.  And in fact, I never really tracked it down, I just got a
 test that could semi-reliably reproduce it, and then by trial and
 error discovered that if I changed the alignment of a particular
 Storable instance from 1 to 4, the problem stopped happening (and 1
 should have been correct, it was a struct of chars).  Not exactly a
 satisfying solution, and now I'm thinking all I did was make ghc 6
 stop manifesting it, and with 7 it's back again.

 I'm most suspicious of the FFI usage since it's easy to corrupt memory
 in C++ and even easier to write a bogus Storable instance that does
 the same, but I really have no idea what or where.  My first thought
 was to start cutting out portions (especially FFI-using portions) to
 try to isolate it, but it's slow going because it can sometimes take
 quite a while to the bug to come out again.  My second thought was
 that I need a more automated way to reproduce it, but it's nontrivial
 because it only comes up when I'm using the interactive GUI parts,
 which are also incidentally a big chunk of FFI.  And even if I do get
 a repro, as I did before, it just means I can more quickly poke
 randomly at things hoping they change it, but even if I can get it to
 stop happening it doesn't mean I understood it, or even really fixed
 it.  This is also the kind of bug (well, it was last time), which is
 highly dependent on the code, so add one print and it stops happening.
  I have a sort of complicated scheme where I pass a wrapped haskell
 function callback along with a wrapped freeHaskellFunPtr to free the
 last one, along with itself, maybe it's something to do with that.

 Anyone out there with ideas or advice on how to track down this kind
 of bug?  My next thought is to try to automate the GUI parts, or maybe
 just the FFI part, so I can write a program to randomly fuss with it
 until it crashes.  I've also tried valgrind, but it doesn't report
 anything suspicious.  But it also doesn't seem to work on FFI Storable
 corruption, I've tried intentionally inserting a bad poke and valgrind
 still won't report it.

 Thanks in advance for any insight!


 Actually, there's a whole other discussion which has been nagging at
 me for a while, though another thread would be more appropriate.  But
 in short it's that it feels like hsc2hs is just too low level, and too
 error-prone.  It's tempting to use because it comes with ghc, but it
 seems bad to tell people haskell is a safe language, but as soon as
 you want to talk to C you're writing totally unchecked pokes and
 peeks.  Maybe I should go evaluate the alternatives like c2hs, or
 maybe safety features can added to hsc2hs.  Wouldn't it be nice to
 have ghc come with a high level and safe FFI language?

 ___
 Haskell-Cafe mailing list
 Haskell-Cafe@haskell.org
 http://www.haskell.org/mailman/listinfo/haskell-cafe

___
Haskell-Cafe mailing list
Haskell-Cafe@haskell.org
http://www.haskell.org/mailman/listinfo/haskell-cafe

Re: [Haskell-cafe] debugging memory corruption

[Haskell-cafe] debugging memory corruption

Re: [Haskell-cafe] debugging memory corruption

3 matches

Site Navigation

Mail list logo

Footer information