Thank you very much, Philippe,
The --fair-sched option was set in an attempt to fix this. I had read
about interminable FUTEX_WAIT status and I think that was one of the
suggestions. Clearly it doesn't make any difference.
I think I've tried 3.9.0, but I will double-check and run that one from now
on anyway.
I have tried connecting with gdb and there wasn't much visible. I'll try
again though and also try vgdb - I was unaware of this tool.
Not sure what is getting locked, whether it's Valgrind or our code. We do
use threading but only in a limited way, and I'm pretty sure memcheck is
hanging up on single-threaded cases. Hopefully the extra logging etc will
reveal something. I can't easily log onto the machine from here - I'll run
the experiments you suggest and report back in a short while.
One thing I didn't mention, which might be important, is that I run
valgrind through a python-driven process-pool. I use the multiprocess
module to spawn off a bunch of valgrinds. I don't think its relevant as it
was working fine for several weeks like this before the hang-ups started.
Best wishes and thanks again,
David.
On Sun, Jan 26, 2014 at 1:07 PM, Philippe Waroquiers <
philippe.waroqui...@skynet.be> wrote:
> On Sun, 2014-01-26 at 02:20 +0000, David Carter wrote:
> > Hi,
> >
> >
> > I've got an issue with memcheck in Valgrind 3.8.1 hanging. I've left
> > processes running for weeks or even months but they don't complete
> > (normally these processes run in a few minutes tops, and they were
> > working fine with memcheck until a while ago.
> >
> >
> > Has anyone seen anything like this before? Here are the details:
> >
> >
> > options:
> >
> > --quiet --track-origins=yes --free-fill=7a
> > --child-silent-after-fork=yes --fair-sched=no --log-file=/path/to/log
> > --suppressions=/path/to/suppression.file
> >
> >
> >
> > strace shows:
> >
> > Process 5223 attached - interrupt to quit
> >
> > read(1027,
> With --fair-sched=no, valgrind uses a pipe to implement a "big lock".
> It is however not clear with what you have shown if this 1027 is
> the valgrind pipe big lock fd. If yes, then it looks like a bug in
> valgrind, as the above read means a thread want to acquire the big
> lock to run, but the thread currently holding the lock does not
> release it.
>
> Here are various suggestions :
> 1. when you are in the above blocked state, use gdb+vgdb
> to connect to your process, and examine the state
> of your process (e.g. which thread is doing what)
> (the most likely cause of deadlock/problem is your application, not
> valgrind, at least when looking at your mail with
> a "valgrind developer hat on" :).
>
> 2. upgrade to 3.9.0, there are many bugs solved since 3.8.1
> (probably not yours, I do not see anything related to deadlock
> but one never knows).
>
> 3. run with a lot more traces e.g.
> -v -v -v -d -d -d --trace-sched=yes --trace-syscalls=yes
> --trace-signals=yes
> and see if there is some suspicious output.
>
> Philippe
>
>
>
>
------------------------------------------------------------------------------
CenturyLink Cloud: The Leader in Enterprise Cloud Services.
Learn Why More Businesses Are Choosing CenturyLink Cloud For
Critical Workloads, Development Environments & Everything In Between.
Get a Quote or Start a Free Trial Today.
http://pubads.g.doubleclick.net/gampad/clk?id=119420431&iu=/4140/ostg.clktrk
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users