On Thu, 2012-11-08 at 00:18 +0100, David Faure wrote: > On Wednesday 07 November 2012 23:00:51 Philippe Waroquiers wrote: > > "discover the bug" is related to the doubful construct, not > > to a race condition > > If there's no race condition and no deadlock, I'm not sure what bug you want > to detect :-) The "doubtful construct". Such kind of construct is in my opinion better removed, and better detect that via a tool.
> > > Note that currently, laog is producing messages which should > > be considered as "lock order warning", not as > > "for sure there is a deadlock order problem". > > Yes, but why does it warn about lock order? Because it could cause deadlocks. Not necessarily. Doubtful construct like the above could cause e.g. livelocks if the "inverted" (failing) trylocks are not properly handled. So, even if a wrong order does not necessarily cause a deadlock, it can be a good idea to detect these doubtful constructions and remove them. > > I agree, this is about potential deadlocks, not actual deadlocks. But > trylock 1 + trylock 2 vs trylock2 + trylock1 (the case we're talking about in > this part of the mail) is not even a potential deadlock. It can't ever > deadlock. So there's nothing to warn about. A compiler produces warnings and errors. warnings are useful to detect potential problems. Similarly, a wrong lock order (even if not deadlocking) might be something you do not want. And so, have e.g. an option to continue to warn for such a case (i.e. continue to have the current helgrind behaviour) would be useful for many users. > > > The trylock is one case of "warning, not an error" which could/should > > be improved. > > But there are others e.g. laog does not understand the concept of > > "guard locks" which is (IIUC): each thread can acquire a single lock > > in a set of locks. If a thread wants to acquire more than one lock > > (in any order then), it first has to acquire the "guard lock", > > and then can lock in any order any nr of locks in the lock set. > > With this guard lock, not possible to have a deadlock, > > but for sure this is not understood by the current helgrind > > laog algorithm. > > Right. That one definitely needs annotations in the source code, I would > think. > There's no way for the tool to detect that these mutexes all go together. Search on the web for "good lock algorithm", which solves the problem of guard lock detection in deadlock lock graph analysis. > > I gave it a try, but I'm hitting a problem with exactly that, passing > isTryLock to that code. isTryLock is set in HG_PTHREAD_MUTEX_LOCK_PRE and > similar, while the above code is called from HG_PTHREAD_MUTEX_LOCK_POST and > similar. If I understand correctly, adding an argument to the _POST variant > would break source compatibility for the existing userland macros? I do not think it breaks compatibility for two reasons: 1. I believe these are "system client requests", executed by helgrind interception code, not by client code. 2. it is possible to "add an argument", because the requests all have a fixed nr of arguments, with "unused arguments" passed as 0. As long as "really using" such an "unused" argument keeps the same semantic for the 0 value, it is ok to add such arguments. (e.g. I have in 3.7.0 "added" a new argument to the memcheck leak search client request to do delta leak search, without breaking the compatibility: 0 means full leak search 1 means delta leak search > I'll finish the patch then, but only if you agree with the approach, > otherwise > this would be dead code, i.e. a wasted effort. At this point you don't seem > fully convinced :) Because I believe the below "3 threads 3 locks" is the counter-example which shows that simply adding the trylock in the graph does not work. Note that convincing me is neither a sufficient nor necessary condition to have a complex helgrind patch accepted :). > > > But I suspect that the insertion of a trylock in the graph > > might later on cause a 'wrong' cycle to be detected. > > E.g. (L = lock, T = trylock, L and T followed by lock nr) > > threadA L1 T2 > > threadB L2 L3 > > threadC L3 L1 > > cannot deadlock (I think :) if threadA releases lock 1 > > when T2 fails. > > Well, if T2 fails then we have no cycle, and if it succeeds we have a real > potential deadlock. The question is whether we want to remember the T2 > attempt > (and warn later) even when T2 fails. I would say, if it failed, it's like it > didn't happen. If T2 succeeds, there is of course no deadlock : in this case, threadA can do the required work, and then release the locks. So, with T2, the above cannot deadlock (if we assume the heuristic of threadA is to release L1 when T2 fails). If the heuristic of threadA is rather to just retry T2, then we have a deadlock. So, in summary: if T2 succeeds, for sure no deadlock (at least for this run :). if T2 fails, then depending on threadA heuristic, we might have a deadlock (probably better to call it a livelock, as threadA will continue to burn CPU to re-trylock 2 :). Now, with the above, let's imagine we apply the rule: When inserting a trylock, do not compute cycles but just insert it "as a normal lock operation". If the 3 threads execute in sequence (fully), what will be inserted is L1 L2 L2 L3 L3 L1 At this time, the graph has "forgotten" that L2 done by threadA is in fact a T2. And so, the last insertion of L1 will not be able to detect that this is not a deadlock : a cycle will be found, while according to the assumed threadA heuristic, it will never deadlock So, I am still not convinced that T2 handling by "just add L2, but do not check cycle" will work. > Do you actually store failed trylocks currently? > > > But when L3 L1 will be inserted, a cycle will be found > > via T2 (if the graph has not remembered this is a trylock). > > So, I am still (somewhat intuitively) thinking that we need > > to have nodes and/or edges marked with "this is a trylock" > > and have the graph exploration taking these marking into account > > to not generate such "false warnings". > > My idea is rather that a successful trylock is just like a lock (except that > it shouldn't warn at that precise moment, but if any later real-lock happens > out of order with it, then we should warn, this is why successful trylocks > should be recorded), and a failed trylock should NOT be recorded. > So at this point I don't see a need for remembering "this was a trylock". I > don't see how this could matter later on. See above. It looks like we need to clarify what is the expected "application behaviour" that the helgrind "lock detection order" is supposed to "simulate" and so detect warnings and/or errors. The current laog algorithm is based on the following application model: * when a trylock fails, the application just retries this trylock, it keeps all the other locks. For such a model, the current laog algorithm works, because a trylock can be handled exactly like a normal lock. If we do not want to handle a trylock like a normal lock, we have to define what is the heuristic that the application is supposed to apply for a failing trylock. As shown above, depending on this, the algorithm will have to differ. The example threadA T1 T2 threadB T2 T1 cannot "deadlock", but depending on the heuristic can still "livelock" (not very different of a deadlock, except it consumes more cpu). So, do we want helgrind to warn or not for that case ? (cfr initial discussion about what helgrind is supposed to warn for ?). Philippe ------------------------------------------------------------------------------ Everyone hates slow websites. So do we. Make your web apps faster with AppDynamics Download AppDynamics Lite for free today: http://p.sf.net/sfu/appdyn_d2d_nov _______________________________________________ Valgrind-users mailing list Valgrind-users@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/valgrind-users