On Thu, 2017-12-07 at 12:39 +0000, Silva João wrote:
> > If you have 39 tasks in Runnable state, I guess that they are not all
> > blocked in libc read ?
> > So, you might investigate which task(s) are really still doing something
> > by doing e.g.
> >    thread apply all bt
> > and/or put breakpoints at places that you know should be soon
> > encountered
> > by the runnable tasks and continue the execution.
> > And then control-c, and redo the above to see if some tasks are/have
> > still
> > progressed.
> > 
> > Also, from gdb, you can do
> >    monitor v.info scheduler
> > to have the valgrind status of the tasks/threads.
> 
> Thanks for the commands.
> 
> All 38 threads are waiting on the 1st one (pthread_cond_wait). The first one 
> is blocked on the file read.
Then you have to understand what this task is doing.
Isn't the backtrace pointing at what the code is doing and what this
read could be ?
Look at the file descriptor on which it is reading and see what this fd is ?
Is it a real file ? (unlikely to be blocking then)
Is it a pipe ? A tcp/ip connection ?
Use lsof if you cannot determine in gdb what this fd is for.

And then you have to guess why this read does not return.

> 
> > You can also use the option --trace-sched=yes to see how and if valgrind
> > still schedules the threads.
> > 
> > Note that at my work, we are using valgrind + Ada tasks without
> > any particular problem.
> 
> The trace-sched option produces a lot of output continuously:
> 
> --20282--   SCHED[1]: releasing lock (VG_(scheduler):timeslice) -> 
> VgTs_Yielding
> --20282--   SCHED[1]:  acquired lock (VG_(scheduler):timeslice)
> --20282--   SCHED[1]: releasing lock (VG_(client_syscall)[async]) -> 
> VgTs_WaitSys
> --20282--   SCHED[1]:  acquired lock (VG_(client_syscall)[async])
This shows that on valgrind side, task 1 is not blocked and valgrind still let
it run..
Now, you might understand what is happening with the above suggestions
(gdb backtrace, lsof, ...).
If the above does not clarify, you might learn more/have a hint by adding
  --trace-syscalls=yes --trace-signals=yes 

You might also compare the syscalls executed natively and under valgrind
by using strace. In this comparison, you will have to take into account
that valgrind changes the way threads are scheduled, and valgrind introduces
some syscalls for its own internal kitchen.
So, the comparison is not mechanical ...

 
> 
> > Possibly also, the change in the way the threads are scheduled causes an
> > application deadlock.
> > 
> > You might thus also try --tool=helgrind just in case this would reveal
> > some
> > non thread safe bug ...
> 
> There seems to be some thread errors like:
> 
> Lock at 0xD92BC0 was first observed
> Possible data race during read of size 8 at 0x7B459FD8 by thread #1
> This conflicts with a previous write of size 8 by thread #2
> 
> Or can these be false positives?
This can be false positive of course, and of course, this can be a true
positive :).
With only an address, no access to the code, no backtrace, no reproducer,
there is not much feedback we can give.

Let me just tell that at my work, we have added for helgrind a few suppression
entries related to the 'low level implementation of the gnat runtime', to
suppress false positive created by the low level inner working of the runtime.

To see what you case is, the minimum needed would be the stack traces of
the error msg.

In summary, at this point, it looks like you have to debug your application
when running under valgrind, and then you might determine if what you see
is a real application bug, or a valgrind bug/limitation e.g. in the valgrind
scheduler/signal handling/syscall handling or whatever.

At this state, without further info, let's assume you have
an application bug :)

Philippe


------------------------------------------------------------------------------
Check out the vibrant tech community on one of the world's most
engaging tech sites, Slashdot.org! http://sdm.link/slashdot
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to