On Thu, 2012-11-08 at 00:18 +0100, David Faure wrote:
> On Wednesday 07 November 2012 23:00:51 Philippe Waroquiers wrote:
> > "discover the bug" is related to the doubful construct, not
> > to a race condition
> 
> If there's no race condition and no deadlock, I'm not sure what bug you want 
> to detect :-)
The "doubtful construct". Such kind of construct is in my opinion better
removed, and better detect that via a tool.

> 
> > Note that currently, laog is producing messages which should
> > be considered as "lock order warning", not as
> > "for sure there is a deadlock order problem".
> 
> Yes, but why does it warn about lock order? Because it could cause deadlocks.
Not necessarily. Doubtful construct like the above could cause e.g.
livelocks if the "inverted" (failing) trylocks are not properly handled.
So, even if a wrong order does not necessarily cause a deadlock, it
can be a good idea to detect these doubtful constructions and remove
them.


> 
> I agree, this is about potential deadlocks, not actual deadlocks. But
> trylock 1 + trylock 2 vs trylock2 + trylock1 (the case we're talking about in 
> this part of the mail) is not even a potential deadlock. It can't ever 
> deadlock. So there's nothing to warn about.
A compiler produces warnings and errors. warnings are useful to detect
potential problems. Similarly, a wrong lock order (even if not
deadlocking) might be something you do not want. And so, have e.g. an
option to continue to warn for such a case (i.e. continue to have the
current helgrind behaviour) would be useful for many users.

> 
> > The trylock is one case of "warning, not an error" which could/should
> > be improved.
> > But there are others e.g. laog does not understand the concept of
> > "guard locks" which is (IIUC): each thread can acquire a single lock
> > in a set of locks. If a thread wants to acquire more than one lock
> > (in any order then), it first has to acquire the "guard lock",
> > and then can lock in any order any nr of locks in the lock set.
> > With this guard lock, not possible to have a deadlock,
> > but for sure this is not understood by the current helgrind
> > laog algorithm.
> 
> Right. That one definitely needs annotations in the source code, I would 
> think.
> There's no way for the tool to detect that these mutexes all go together.
Search on the web for "good lock algorithm", which solves the problem
of guard lock detection in deadlock lock graph analysis.


> 
> I gave it a try, but I'm hitting a problem with exactly that, passing 
> isTryLock to that code. isTryLock is set in HG_PTHREAD_MUTEX_LOCK_PRE and 
> similar, while the above code is called from HG_PTHREAD_MUTEX_LOCK_POST and 
> similar. If I understand correctly, adding an argument to the _POST variant 
> would break source compatibility for the existing userland macros?
I do not think it breaks compatibility for two reasons:
  1. I believe these are "system client requests", executed by
     helgrind interception code, not by client code.
  2. it is possible to "add an argument", because the requests all
    have a fixed nr of arguments, with "unused arguments" passed as 0.
    As long as "really using" such an "unused" argument keeps the same
    semantic for the 0 value, it is ok to add such arguments.
    (e.g. I have in 3.7.0 "added" a new argument to the memcheck leak
     search client request to do delta leak search, without breaking
     the compatibility:
         0 means full leak search
         1 means delta leak search


> I'll finish the patch then, but only if you agree with the approach, 
> otherwise 
> this would be dead code, i.e. a wasted effort. At this point you don't seem 
> fully convinced :)
Because I believe the below "3 threads 3 locks" is the counter-example
which shows that simply adding the trylock in the graph does not
work.
Note that convincing me is neither a sufficient nor necessary condition
to have a complex helgrind patch accepted :).


> 
> > But I suspect that the insertion of a trylock in the graph
> > might later on cause a 'wrong' cycle to be detected.
> > E.g. (L = lock, T = trylock, L and T followed by lock nr)
> >   threadA L1 T2
> >   threadB L2 L3
> >   threadC L3 L1
> > cannot deadlock (I think :) if threadA releases lock 1
> > when T2 fails.
> 
> Well, if T2 fails then we have no cycle, and if it succeeds we have a real 
> potential deadlock. The question is whether we want to remember the T2 
> attempt 
> (and warn later) even when T2 fails. I would say, if it failed, it's like it 
> didn't happen.
If T2 succeeds, there is of course no deadlock : in this case, threadA
can do the required work, and then release the locks.
So, with T2, the above cannot deadlock (if we assume the heuristic
of threadA is to release L1 when T2 fails). If the heuristic of threadA
is rather to just retry T2, then we have a deadlock.
So, in summary:
   if T2 succeeds, for sure no deadlock (at least for this run :).
   if T2 fails, then depending on threadA heuristic, we might
   have a deadlock (probably better to call it  a livelock,
   as threadA will continue to  burn CPU to re-trylock 2 :).

Now, with the above, let's imagine we apply the rule:
  When inserting a trylock, do not compute cycles but just insert
  it "as a normal lock operation".
If the 3 threads execute in sequence (fully), what will be
inserted is
   L1 L2
          L2 L3
                 L3 L1
At this time, the graph has "forgotten" that L2 done by threadA
is in fact a T2.
And so, the last insertion of L1 will not be able to detect
that this is not a deadlock : a cycle will be found, while
according to the assumed threadA heuristic, it will never deadlock

So, I am still not convinced that T2 handling by
   "just add L2, but do not check cycle"
will work.

> Do you actually store failed trylocks currently?
> 
> > But when L3 L1 will be inserted, a cycle will be found
> > via T2 (if the graph has not remembered this is a trylock).
> > So, I am still (somewhat intuitively) thinking that we need
> > to have nodes and/or edges marked with "this is a trylock"
> > and have the graph exploration taking these marking into account
> > to not generate such "false warnings".
> 
> My idea is rather that a successful trylock is just like a lock (except that 
> it shouldn't warn at that precise moment, but if any later real-lock happens 
> out of order with it, then we should warn, this is why successful trylocks 
> should be recorded), and a failed trylock should NOT be recorded.
> So at this point I don't see a need for remembering "this was a trylock". I 
> don't see how this could matter later on.
See above.

It looks like we need to clarify what is the expected "application
behaviour" that the helgrind "lock detection order" is supposed to
"simulate" and so detect warnings and/or errors.

The current laog algorithm is based on the following application model:
* when a trylock fails, the application just retries this trylock,
  it keeps all the other locks.
For such a model, the current laog algorithm works, because a trylock
can be handled exactly like a normal lock.

If we do not want to handle a trylock like a normal lock, we have to
define what is the heuristic that the application is supposed to apply
for a failing trylock. As shown above, depending on this, the algorithm
will have to differ.

The example
   threadA   T1 T2
   threadB   T2 T1
cannot "deadlock", but depending on the heuristic can still "livelock"
(not very different of a deadlock, except it consumes more cpu).
So, do we want helgrind to warn or not for that case ?
(cfr initial discussion about what helgrind is supposed to warn for ?).

Philippe




------------------------------------------------------------------------------
Everyone hates slow websites. So do we.
Make your web apps faster with AppDynamics
Download AppDynamics Lite for free today:
http://p.sf.net/sfu/appdyn_d2d_nov
_______________________________________________
Valgrind-users mailing list
Valgrind-users@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/valgrind-users

Reply via email to